AN  ALTERNATIVE  TO  THE  ORIGINS  OF  LIFE  THEORIES: 
FUNCTIONALLY  ENDOWED  DNA  MOLECULES  CAPABLE  OF 
IMPROVED  CATALYSIS 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 


ACKNOWLEDGMENTS 


1 believe  that  accomplishment-  though  ill  large  part  an  individual's  effort  to 
persevere,  is  also  credited  to  those  who  have  influenced  one  in  the  direction  of  life  he/shc 
has  chosen.  I am  very  fortunate  to  have  many  people  in  my  life  dial  have  guided  me 
through  these  enduring  years.  My  success,  if  any.  is  a tribute  to  them. 

I would  like  to  begin  of  course  with  my  family:  James,  Josephine,  and  Kevin  Ang. 
1 draw  strength  from  knowing  I always  have  them  by  my  side.  My  parents  taught  me  that 
the  pursuit  of  my  goals  should  coincide  with  the  pursuit  of  being  a better  person.  1 will 
always  honor  my  family  and  conscience  first. 

Professor  Steven  A.  Benner  is  my  Ph.D.  mentor  and  advisor.  The  lessons  I have 
learned  in  his  lob  will  be  invaluable  to  me  in  my  future  os  a scientist  One  day,  I hope  to 
he  able  to  be  as  innovative  and  creative  as  Professor  Benner.  I have  also  appreciated  his 


Professor  Charlotte  Hum  was  my  first  mentor  and  my  high  school  English 
teacher.  Unfortunately  she  passed  away  during  my  graduate  studies  and  will  never  see 

belp  others.  Her  suffering  due  to  cancer,  diabetes,  and  congestive  heart  failure  is  a 


Professor  Michael  Kasha  was  my  undergraduate  mentor  in  physical  chemistry  at 


University. 


i a regular  basis.  He  has  molded  the  way  I think  about 

Professor  Ken  Goldsby  set  the  standard  for  me  as  being  a great  research  advisor 
and  gave  me  my  first  college  research  experience  in  his  inorganic  chemistry  laboratory  at 
Florida  State  University. 

Dr.  Joseph  Layon  is  my  long  lime  physician  mentor  and  friend  who  is  one  of  the 
main  reasons  why  1 feel  I’m  at  UF.  He  has  always  made  sure  that  I stayed  on  track. 

Professor  A!  Lcwin  is  the  director  of  the  M.D./Pb.D.  program  and  a member  of 
my  committee.  Professor  Lcwin  has  been  extremely  supportive  as  both  a research  and 
graduate  student  advisor.  Because  of  his  commitment,  1 feel  that  Professor  Lcwin  has 
brought  much  honor  to  the  M.D./Ph.D.  program  at  the  University  of  Florida. 

1 will  always  ap 

who  interviewed  me  for  the  M.D/Ph.D.  program  and  inspired  mr 

Dr.  Petra  Burgstaller  was  my  mentor  in  in  vitro  selection.  Dr.  Jcong  Ho  Park  and 


very  fortunate  to  have  had  Emma  Westermann-Clark  and  Bernard  Suh  as 
es:  They  taught  m< 

Jurczyk.  Dr.  Tot 


contributed  to  the  organic  chemistry  in  this  project.  Mike  Thomson  was  responsible  for 


its  and  help  taught  the  undergraduates  many  molecular  biology 
techniques  (especially  cloning).  Morgan  Peltier  taught  me  to  use  SAS.  Shaojun  Zhang 
offered  helpful  advice  on  statistical  methods  and  allowed  me  to  pick  his  brain  on 
probability  theory.  I 


Daniel  Caraco,  Mike  Sismour,  Gina  Cannarozzi,  Kevin  Devine.  Rachelie  Childress. 
Thomas  Gigcr,  Karen  Wilson,  Daniel  Hutter,  Vic  Otamaa,  Stefan  Lutz,  David  Schreibcr, 
Janos  Kodra,  and  Dave  Liberies,  and  the  Stewart  group  (again,  in  no  particular  order) 

for  all  making  my  studies  in  Chemistry  Building,  Leigh  452  a great  one  to  remember. 


for  being  the  closest  of  friends. 


TABLE  OF  CONTENTS 


( ONQ  usrotc  

REFERENCES  


BIOGRAPHICAL  SKETCH. 


of  the  University  of  Florida  in  Partial  Fulfillment  of  the 


AN  ALTERNATIVE  TO  THE  ORIGINS  OF  LIFE  THEORIES: 
FUNCTIONALLY  ENDOWED  DNA  MOLECULES  CAPABLE  OF 
IMPROVED  CATALYSIS 


Darwin  Noel  Ang 
August  1999 

Chairman:  Steven  A.  Benner 

Major  Department:  Anatomy  and  Cell  Biology 


This  research  proposes  a novel  hypothesis  to  account  for  the  origins  of  life.  Under 
this  hypothesis,  life  evolved  from  DNA  molecules  endowed  with  additional  chemical 

experiments  was  a lysine-like  thymidine  molecule,  5-(3-amino-l-propynyl)-2'- 
deoxyuridinc.  This  building  block  contributes  a positive  charge  at  neutral  pH  that  is 
ee  Erst  screened  to  identify  those  that  could 


for  their  ability  to  accept  the  functionally  endowed  nucleoside  triphosphate.  In 
selection  (SELEX™, 


then  used  to  create  catalysts  from  both  endowed  and  non-endowed 


DNA  pools.  In  both  cases,  a random  pool  of  1014  oligonucleotides  40  nucleotides  in 
length  was  exposed  to  a selection  pressure  that  allowed  only  a few  molecules  to  survive. 
The  selection  pressure  favored  towards  molecules  that  could  cleave  a ribonucleotidic 
phosphodiester  bond  within  the  backbone  of  the  oligonuoleotidc.  Mutation  was 


Catalysts  were  obtained  from  both  libraries,  with  the  endowed  library  providing 

generating  catalysts  were  determined  by  fitting  the  data  from  the  functionalized  and  non- 
functionahzcd  DNA  pools  to  a common  statistical  distribution,  the  Weibull  distribution. 
Wcibull  distributions  (a)  allowed  for  the  prediction  and  subsequent  statistical 

modelled  the  effect  of  tightening  the  selection  pressure  on  a population  of  catalysts;  and 
(c)  quantitated  die  increase  in  the  catalytic  potential  of  a DNA  library  attributable  to  the 
presence  of  the  ammonium  functionality.  The  calculated  intrinsic  potential  for  the  amino 
acid-like  DNA  pool  was  better. 

Finally,  approximately  eighty  sequences  from  each  SELEX  experiment  were 
determined  and  an  evolutionary  relationship  between  the  sequences  was  proposed.  Both 


convergent  and  divergent  evolution  was  observed  in  both  pools.  These  findings  support  a 
new  molecular  candidate  for  a biopolymer  that  might  perform  both  genetics  and  catalysis. 


BACKGROUND 


humankind.  Until  recently,  it  has  remained  a scientific  enigma.  Philosophy  and  Religion 
have  been  the  primary  disciplines  attempting  to  appease  the  intellectual  and  spiritual 


of  this  century  with  the  Oporin-Haldanc  hypothesis  (Oparin,  1978),  which  referred  to 
ideas  then  emerging  from  the  organic  chemical  analysis  of  biological  systems,  and  how 
biochemicals  might  seif-aggregate  to  form  structures. 


Only  after  the  elucidation  of  the  structure  of  genetic  and  catalytic  biopolymers 
was  it  possible,  however,  to  address  the  "chicken  or  the  egg”  question.  This  question 
arises  from  the  fact  that  contemporary  life  uses  two  macromolccules,  proteins  and  nucleic 
acids,  to  perform  biological  fhnetion,  and  that  each  appears  to  be  required  for  the 
synthesis  of  the  other.  The  probabilities  of  cither  proteins  or  nucleic  acids  arising 
spontaneously  from  a prebiolic  soup  is  low  enough.  It  is  inconceivable,  however,  that 

production  of  the  other. 

The  generic  solution  to  this  problem  is  based  on  the  assumption  that  one  of  the 
two  biopolymets  was  able  to  survive  without  the  other.  This  implies  that  cither  nucleic 
acid  or  protein,  alone  and  without  the  other,  was  able  to  perform  both  catalytic  function 
and  genetic  function,  presumably  assisted  by  small  molecules  (including  lipids)  that 


would  have  emerged  spontaneously  from  abiolic  processes.  These  two 


summarized  in  the  form  of  two  hypotheses:  The  "RNA  World  Hypothesis”  and  the 
“Protein  World  Hypothesis". 

Each  hypothesis  assumes  that  life  began  with  its  respective  biomolccular 

the  two  hypotheses  are  at  opposite  ends  of  the  spectrum  (Figure  I).  The  RNA  World 
hypothesis,  proposed  originally  in  1962  by  Alexander  Rich  (Rich,  1962),  was  supported 
by  the  recent  discovery  of  ribozymes,  RNA  molecules  capable  of  catalyzing  biochemical 
reactions  (Gestland  et  al.,  1999).  The  independent  discoveries  by  Thomas  Cech  and 
Sidney  Altman  that  RNA  functions  as  enzymes  in  biological  systems  disproved  the  then- 
prevailing  view  that  oligonucleotides  were  involved  only  in  information  storage. 
Ribozymes  contribute  to  the  pathogenesis  of  certain  parasites  (Fcrrc-D'Amarc,  ct  al„ 
1995),  aid  in  the  processing  of  genetic  information  (Green  ct  al.,  1998) , and  even  now 
are  being  sought  as  molecular  tool  to  add  to  our  armamentarium  in  genetic  therapy. 

The  feet  that  RNA  is  capable  of  enzyme-tike  catalytic  activity  suggests  that  an 
RNA  molecule  performing  both  genetic  and  catalytic  functions  could  have  been  the 
precursor  of  life  (Joyce,  1989).  Major  weaknesses  of  this  hypothesis  remain,  however. 
For  example,  the  phosphodiester  backbone  of  RNA  molecules  is  readily  prone  to 
hydrolysis  due  to  the  2'-hydroxy  group  on  the  ribosc  ring  Further,  RNA  molecules  are 
relatively  slow  catalysts  (Bartel  and  Szostak,  1993). 


he  hypothesis  proposed  by 


LIFE 


I Hypothesis 


The  Protein  World  hypothesis,  on  the  other  hand,  assumes  that  life  began  with  a 


biopolymer  (proteins)  already  known  to  provide  excellent  catalysts.  In  (act,  when  we 
think  of  biochemical  catalysis,  protein  enzymes  are  often  tile  first  things  to  come  to  mind. 
With  the  recent  discovery  by  Pruisner  of  prions  (Prusincr,  1997),  this  hypothesis  has  been 
taken  seriously  as  a possible  model  for  the  origin  of  life.  Prions  may  be  the  cause  of  a 
class  of  diseases  known  os  spongiform  encephalitis  (Prusincr,  1 997).  In  their  mechanism 
of  reaction,  prions  are  believed  to  be  proteins  that  template  the  folding  of  other  proteins, 
therefore  “self-reproducing'*  their  conformation  (although  not  directing  their  own 
synthesis). 

Recently,  Reza  Ghadiri's  laboratory  at  the  Scripps  Research  Institute  synthesized 
peptides  that  could  direct  their  own  synthesis  (Lee  et  al„  1996),  at  least  if  suitable 
precursors  were  provided.  These  behave  in  a hypercyclic  network  (Lee  ct  al„  1997),  a 
collection  of  two  or  more  molecular  systems  that  arc  able  to  aid  in  the  synthesis  of  each 
(Eigen,  1971).  The  self-replicating  peptides  catalyze  o template-directed 
homodimerization  of  two  peptide  fragments.  Tempiating  relies  on  the  common 
hydrophobic  interaction  between  certain  proteins  that  leads  to  a tertiary  structure 
assembled  from  alpha  helices  (Lee  ct  al.,  1996).  The  ligation  involves  the  fonnation  of  an 


While  intriguing  as  an  example  of  peptide-bosed  replication,  Ghadiri's  system  is 
still  far  from  “living".  First,  the  chemistry  is  limited  in  scope.  Further,  the  two  chemical 
groups  (the  amine  and  thioestcr)  involved  in  the  synthesis  reaction  arc  very  reactive. 


n.  However,  this  novel  idea  and  set  of 


In  light  of  the  deficiencies  in  both  hypotheses,  a third  hypothesis  was  considered, 
and  is  examined  by  the  research  reported  in  this  dissertation.  This  hypothesis  holds  that 
an  early  version  of  life  used  nucleic  acid  molecules  that  were  endowed  with  protein-like 
functionality.  The  functionality  was  covalently  attached  to  nucleoside  building  blocks. 


and  the  evolvability  evident  in  RNA  molecules.  Further,  they  may  have  some  of  the 
catalytic  potential  displayed  by  proteins.  If  we  were  to  again  consider  the  "chicken  or 
egg"  analogy,  where  for  obvious  reasons  RNA  is  thought  to  represent  the  "egg”  and 

the  middle,  bu 

Under  this  hypothesis,  the  primitive  biopolymcr  carried  a broader  range  of 
"responsibility"  than  carried  by  either  proteins  or  nucleic  acids  in  contemporary  life 
forms.  This  biopolymer  therefore  is,  at  a molecular  level,  more  "complex"  than 


le  to  mind:  What  must  a complex  molecule  look  like  to 


conditions?  Finally,  if  life  actually  emerged  based  on  a single  biopolymer  system,  can 
chemical  "fossils"  of  these  complex  biopolymers  be  found  in  modem  metabolism? 

These  questions  correspond  to  chemical  questions.  Would  the  combination  of 
genetic-like  structures  and  reactive  amino  acid-like  functional  groups  lead  to  a molecule 

molecules  arise  under  non-biological  conditions  such  as  those  believed  to  be  present  on 
early  Earth? 

In  this  work,  a 2’-dcoxyuridinc  derivative  was  prepared  with  a functional  group 


and  Miller,  1995;  Robertson  and  Miller,  1995). 

some  time,  the  probiotic  atmosphere  was  believed  to  have  been  a mixture  of  CHs  and  N2. 
NHj  and  H2O,  or  possibly  CO2,  H2  and  N2  (reducing  atmosphere).  In  a reducing 

as  (Stribling  and  Miller,  1987),  but  also  including  uridine  and  formaldehyde,  two 


reducing,  but  instead  was  made  up  largely  of  CO2,  N2,  H2O  (Lazcano  et  a!., 


prebiolic  atmosphere.  Reducing  conditions  may  have  prevailed  elsewhere  on  the 
primitive  Earth,  however.  Either  on  or  off  Earth,  the  Robertson  and  Miller  experiments 
provide  one  modol  for  how  functionalized  nucleosides  might  have  occurred  arisen  on 
early  Earth. 

Are  there  signs  (or  fossil  remains)  that  functionalized  nucleotides  ever  existed? 
Functionalized  nucleotides  arc  actually  known  in  contemporary  terrean  biological 
systems.  The  best  examples  of  this  are  the  more  than  80  modified  nucleosides 
characterized  from  tRNA  (Bjork.  1995).  Many  of  these  nucleosides  are  endowed  with 
elaborate  functionality.  Even  though  these  functionalized  nucleosides  play  an  integral 
role  in  translation,  particularly  as  "identity  determinants"  for  many  aminoacyl-tRNA 
synthetases  (Bjork.  1995),  their  roles  in  vivo  arc  not  completely  elucidated  and  it  is 
currently  assumed  that  each  type  of  functionality  plays  alternative  roles. 

With  results  from  prebiotic  chemistry  and  the  presence  of  functionalized 
nucleosides  in  our  own  physiology,  a functionalized  DNA  World  hypothesis  is 
conceivable.  Of  course,  the  true  measure  of  this  hypothesis  will  be  whether  or  not 
functionalized  DNA  makes  a better  candidate  for  the  single  biopolymer  life  form.  The 


The  method  used  in  this  research  to  create  both  functionalized  and 
nonfimctionalized  DNA  catalysts  is  known  as  in  vitro  selection  or  SELEX  (Systematic 
Evolution  of  Ligands)(lrvinc  ct  ol.,  1991),  Figure  3.  It  is  not  clear  who  invented  this 
method  first.  It  is  commonly  attributed  to  three  pioneering  researchers  who  developed 
this  field  of  science  in  the  early  1990s;  Jack  Szostak  (Szostak,  1988)  at  the  Massachusetts 
General  Hospital,  Gerald  Joyce  at  the  Scripps  Research  Institute  (Joyce  1989),  and  Larry 


Gold  (Irvine  cl  al.,  1991)  of  the  University  of  Colorado.  In  vilro  selection  begins  with  a 
large  pool  of  random  DNA  molecules  of  the  same  length  (c.g.  1014  1 12  me  ns).  The  pool  is 
exposed  to  an  experimentally  induced  selection  pressure  that  singles  out  specific 


some  reaction,  or  perform  some  other  specified  task.  The  molecules  that  do  not  survive 
the  selection  arc  eliminated.  The  few  molecules  that  do  survive  are  collected  and 
amplified  by  PCR  (Polymerase  Chain  Reaction)  methods.  The  amplified  products  are 
then  reintroduced  into  the  selection  scheme. 

Eventually,  mutations  are  introduced  via  error-prone  PCR  allowing  the  DNA 
molecules  to  evolve.  Mutations  that  are  beneficial  or  do  not  affect  the  selected  phenotype 
of  the  molecule  during  selection,  survive.  Those  mutations  that  decrease  property  of  the 
molecule  demanded  by  the  selection  scheme  are  eliminated  (Figure  3). 

From  this  simple  method,  an  impressive  array  of  oligonucleotide  enzymes  have 
been  selected,  particularly  novel  ribozymes.  The  first  selected  RNA  molecules  were  not 
catalysis.  Rather,  they  were  RNA  and  DNA  molecules  created  as  receptors  for  the 
purposes  of  binding  to  different  ligands  ranging  from  dyes  to  ATP  and  other  biological 
cofactors  (Ellington  and  Szostak,  1992;  Huczinga  and  Szostak,  1993;  Burgstaller  and 
Famoluk,  1994;  Geiger  et  al,,  1996:  Burke  and  Gold  1997).  Although  the  disassociation 
constants  (Ka)  for  these  oligomers  were  relatively  high,  making  them  poor  receptors,  the 
potential  for  new  receptors  spurred  excitement  in  the  scientific  community.  The  term 
“aptamers"  is  used  to  describe  these  RNA  receptors  (Ellington  and  Szostak,  1992). 


Figure  2: 


5-(3-amino-  IpropynylW-dcoxyuridinc.  or  J-basc 


The  evolution  m the  science  of  aptamets  has  taken  on  several  forms.  Researchers 
began  by  seeing  in  vitro  selection  as  a way  of  making  receptors  to  biological  molecules, 
Aptamers  that  bound  to  thrombin,  at  least  in  vitro,  decreased  the  rate  of  formation  of 
fibrin  clot  (Bock  cl  a!.,  1992).  Several  other  aptamers  bound  to  elastosc  (Smith  et  al., 
1995);  these  proved  to  be  modest  inhibitors  of  clastasc  activity.  Currently,  these 
inhibitors  are  being  tested  in  vivo. 

The  need  to  have  stable  aptamers  for  application  in  mammalian  systems  led  to  the 
next  stage  in  the  evolution  of  aptamers  research.  RNA  molecules  are  relatively  unstable. 
RNA  will  hydroly2e  spontaneously.  Further,  ribonuclcascs  are  ubiquitous  in  any  in  vivo 
enviionmcm.  For  both  reasons,  RNA  aptamers  provided  exogenously  have  difficulty 
finding  a serious  role  as  therapeutic  agents.  To  circumvent  this  problem  researchers 
created  the  mirror  image  of  RNA  oligomers.  These  enantiomers  were  found  to  be 
resistant  to  RNA  nuelcascs  (Nolle  et  al.,  1996). 

Ribozymes  created  via  in  vitro  selection  methods  did  not  attract  much  attention 
until  David  Bartel  and  Jack  Szostak  (Bartel  and  Szostak,  1993)  created  the  first  self- 
ligating  ribozyme.  This  landmark  experiment  showed  how  powerful  in  vitro  selection 
could  be  as  a tool  for  creating  actual  enzyme  catalysis.  Chemists  and  biologists  alike 
began  to  ask  how  far  the  biology  and  chemistry  of  an  oligonucleotide  might  go  when 
subjeeted  to  natural  selection  and  evolution  (Jaeger,  1997).  The  implication  that 
ribozymes  could  be  intentionally  selected  in  the  laboratory  to  self-Iigatc  led  to  the  search 
for  broader  catalysis.  In  particular,  it  was  asked  whether  self  polymerization  might  be 
possible  and,  more  specifically,  a self-replicating  nucleic  acid  (replicase)?  !s  the  scope  of 


RNA  catalysis  able  to  go  outside  of  phosphoestcr  chemistry?  And  what  types  of 
molecules  (based  on  (unction)  need  to  be  synthesized  in  order  to  provide  proof  of  concept 
for  the  RNA  World  hypothesis? 

(Wilson  and  Szoslak,  1995),  polymerization  of  triphosphates  (Elkland  and  Bartel,  1996), 
self  aminoacylating  RNA  (lllangasckarc  ct  el.,  1997)  synthesis  of  peptide  bonds  (Zhang 
and  Cech,!997),  synthesis  of  nucleotides  (Unrau  and  Bartel,  1998),  and  two  examples  of 
ribozymes  capable  of  synthesizing  carbon-carbon  bonds  via  Diels-Aldcrs  reaction 
(Tarasow  ct  al„  1 998;  Seelig  and  Jaschkc,  1 999). 


11  is  evidont  that  the  range  of  in  vitro  selected  catalysts  and  aptamers  has  been 
broad.  Much  less  effort  has  been  devoted  to  asking  how  nucleic  acid-based  catalysts 


reaction  mechanisms  may  be  followed  in  RNA-based  catalysis.  These  have  exploited 
microscopic  rate  constants  and  substrate  modification. 


Joyce,  1998;  Esteban  ct  al.,  1998;  Lott  et  al.,  1998).  Generally,  it  appears  that  the 


other  cases,  metal  ions  arc  known  to  play  a structural  role  (Hampel  and  Cowan,  1997). 


Nevertheless,  an  overwhelming  number  of  these  nucleoenzymes  do  require  a metal 

The  first  work  has  now  appeared  to  describing  the  crystal  structures  of  naturally 
occurring  ribozymes  such  as  the  Group  1 ribozyme  domain  (Cate  ct  al.,  1996),  the 


t no  requirement  for  metal  (Geycr  and  Sen,  1997). 


hepatitis  della  vims  ribozyme  (Ferre-D'Amare  ct  al„  1998),  ihc  letrahymena  ribozyme 
(Golden  el  al.,  1998),  and  die  hammerhead  ribozyme  (Scott  el  al.,  1995;  Scott  cl  al., 
1996)  (Doudna,  1997).  This  offers  a new.  not  yel  fully  exploited,  approach  to  underland 
Ihc  catalytic  mechanisms  of  nucleoenzymcs. 


a catalyst  of  a particular  power  appears  in  a oligonucleotide  population.  This  is  not  a 
straightforward  question  to  answer.  It  is  impossible  to  search  all  of  the  possible  sequence 
combinations  for  even  modest  sized  oligomers,  say  lOOmcrs,  which  number 
approximately  r.6  x 10“  molecules.  To  pul  this  number  into  perspective,  the  Universe 
has  only  has  only  ca.  Ifl"1  hydrogen  molecules  in  the  Milky  Way  Galaxy  (Britannica, 
1986).  Obviously,  only  a small  fraction  of  the  possible  sequences  catalysis  can  be 
sampled. 


of  ribozymes  he  discovered  compared  to  the  number  of  sequences  he  sampled  would  give 
a quantitative  statement  that  says,  a pool  of  10'4  RNA  220mers  must  be  sampled  before 
one  catalyst  can  be  found  that  is  capable  of  ligating  itself.  For  RNA-cleaving  DNA 
molecules,  one  only  needs  to  search  a pool  of  10'1  lOOmcrs  before  finding  one  molecule 


capable  of  hydrolyzing  an  RNA  molecule  (Breaker  and  Joyce,  1995).  At  first  glance,  it 
seems  obvious  that  because  the  ratio  of  number  of  catalysis  found  to  number  of 
oligonucleotides  in  the  pool  is  much  higher  for  RNA  cleaving  molecules,  that  RNA 
hydrolysis  can  be  found  more  easier  than  RNA  ligation.  But  is  this  really  true?  In 
sequence  space.  Breaker  sampled  I x I0IJ  molecules  out  of  a possible  1 x 10w;  while 


Bartel  only  searched  I x 10M  out  of  a possible  I x 10IM.  In  fact,  if  we  compare  these 
ratios,  we  find  that  the  search  done  by  Breaker  in  sequence  space  is  several  tens  of  order 
of  magnitude  (10“)  better  than  Bartel’s.  Of  course  this  is  complicated  by  the  Ihct  that  the 

therefore  the  requires  a longer  oligomer.  In  reality,  there  should  be  no  significant 
difference  since  the  ligation  of  RNA's  phosphodiester  bond  is,  chemically,  the 
microscopic  reverse  reaction  of  the  hydrolysis  of  RNA's  phosphodiester  bond.  In 
principle,  one  should  be  able  to  find  a ligase  just  as  easily  from  a pool  of  10'*  lOOmers,  as 
they  will  find  an  RNA  phosphodiesterase. 

In  a review  article  by  Ronald  Breaker,  he  introduces  the  concept  of  how  many 
contiguous  sequences  can  be  represented  in  a stretch  of  oligonucleotides  with  the 
mathematical  transformation,  (x  -y)+  1,  where  x is  the  number  of  nucleotides  in  the 
random  region  and  y is  the  number  of  nucleotides  of  interest  (Breaker,  1997).  With  this 

effectively  represent,  not  only  the  length  of  the  o! 


The  goal  of  this  dissertation  was  to  develop  the  tools  needed  to  do  in  vitro 
selection  with  functionalized  nucleotides,  to  apply  these  to  generate  a functionalized 
DNA  catalyst,  and  to  develop  the  quantitative  tools  to  assess  how  much  the  addition  of 
protein-like  functionality  unproved  the  catalytic  potential  of  nucleic  acids.  The 
dissertation  work  began  with  the  screening  of  polymeases,  the  from  the  biochemical  data 
supporting  an  alternative  single  biopolymer  life-form,  to  mathematical  models  describing 
the  improvement  of  functionalized  DNA  over  nonfunctionalized  DNA.  This  is  due  to  the 


fact  that  a novel  field  within  science  is  being  proposed  whcrchs  a multidisciplii 


approach  is  necessary.  Structural  biology,  probability  theory  and  the  biochemistry  of 
DNA  arc  among  the  few  topics  addressed  in  this  research.  But  it  is  worth  noting  that 

the  question:  Is  it  possible  to  improve  DNA? 

The  functionalization  introduced  into  the  it 
here  is  5-(3-amino-l-propynyl)-2'-deoxyuridinc  (Figure  2).  This  molecule  carries  a 
lysine-like  amino  group  attached  to  a 2 -dcoxyuridinc  nucleotide,  and  was  given  the 
trivial  name  dJ.  At  physiological  pH,  the  amino  group  is  protonated,  and  therefore  carries 
a positive  charge.  This  functional  group  was  chosen  because  natural  nucleotides  lack 
chemical  groups  carrying  a positive  charge.  With  the  introduction  of  a positive  charged 
amino  group,  DNA‘s  electrostatic  properties  are  expanded,  and  should  make  DNA  a 


as  designed  to  create  a catalyst  that  cleaves 
the  phosphodiester  bond  of  a RNA.  To  control  the  position  where  cleavage  occurs, 

51  nucleotides  in  length  containing  a single  ribonucleotide.  The  ribonucleotide  was 


The  library  consisted  of  10,J  molecules  1 12  nucleotides  long.  The  molecules  in 
the  library  were  designed  for  the  purposes  of  PCR  amplification.  Each  molecules 
possessed  a 40  nucleotide  long  random  region  flanked  by  two  defined  primor  binding 
sites.  At  the  5'-cnd  of  the  strand  of  interest,  a biotin  molecule  was  appended  so  that  the 
DNA  strand  could  be  immobilized  on  a sueptavidin  column. 


Molecules  that  were  able  to  catalyze  the  cleavage  of  a RNA-DNA  phosphodiest 
bond  could  release  themselves  from  the  biotin  taj 


column.  These  molecules  were  said  to  have  "survived"  the  selection,  and  were  collected 
and  reamplificd.  Molecules  that  could  not  catalyze  the  cleavage  remained  immobilized  on 
the  suppott  and  subsequently  discarded. 


The  biotinylated  primer  and  ribonucleotide  were  reintroduced  during  PCR  of  the 
ors.  and  the  molecules  were  repeatedly  exposed  to  the  selection.  Figure  4 


■3—  +S^ 


iC.  >'0 — 


The  first  step  towards  performing  in  vitro  selection  using  functionalized  nucleotides 
required  the  identification  of  a polymerase  that  would  incorporate  dJ  via  template 
directed  polymerization,  specially  during  PCR  reactions.  The  initial  method  used  in  this 
study  employed  random  screening  of  commercial  polymerases.  These  were  chosen  from 
both  Family  A and  Family  B polymerases,  two  evolutionary  families  of  polymerases 
family  A and  B defined  by  a phytogeny  created  by  Braithcwhite  and  llo  (Braithewhite 
and  Ito,  1991).  This  classification  is  based  on  the  alignment  of  protein  sequence  data. 
The  families  A.  B.  and  C correspond  to  DNA  polymerase  1,  n,  and  III 

For  those  polymerases  that  showed  promise,  kinetic  studies  were  done  to 
quantitate  the  impact  of  the  functionality  on  the  ability  of  the  polymerase  to  accept  a 
nucleotide  triphosphate.  These  polymerases  were  also  probed  using  a modified  form  of 
the  J-base.  wherein  the  amino  group  was  protected  as  its  phcnylacetyl  amide.  This 
protected  side  chain  lacked  the  positive  charge  of  the  amino. 

A series  of  molecular  modelling  experiments  were  done  by  Bernard  Suh  to  build 
hypotheses  concerning  contacts  between  specific  amino  acids  in  the  polymerase  active 
site  the  functional  group  appended  to  the  nucleotide.  The  sequence  of  a fitraily  B 
polymerase  (used  for  the  kinetic  study.  Vent)  was  threaded  on  to  the  crystal  structure  of  a 
highly  homologous  (>  80%  identical),  but  not  commercially  available,  family  B 
polymerase  ( Thermococcus  gargamriia).  The  geometry  was  subjected  to  a local  energy 
minimization. 

Finally,  in  an  effort  to  confirm  hypotheses  based  on  molecular  modelling,  key 
amino  acid  residues  were  mutated  by  J.  Michael  Thomson  on  both  family  B and  family  A 
polymerases,  and  kinetic  analyses  were  performed. 


EXPERIMENTAL  DATA  AND  RESULTS 


Dripping  H N' 


jev  and  An  Ev 


This  research  posed  a unique  challenge,  since  il  is  the  first  time  modified 
nucleotides  would  be  employed  to  create  dcoxyoligonucleotidc  catalysts.  As  a result,  a 
method  hod  to  be  developed  to  incorporate  modified  nucleotides  by  polymerases. 
Identifying  polymerases  that  incorporated  the  dJ-basc  via  template-directed 
polymerization,  and  assessing  the  kinetic  behavior  of  these  polymerases  was  an  important 
first,  as  well  as  the  rate  limiting,  step  for  these  experiments. 

Primer  Extension  Screening  and  PCR  Amplification:  An  Evolutionary  Predilection 
Resolved 

Many  polymerases  were  screened  and  several  polymerases  were  found  to 


observation  was  made.  There  appeared  to  be  a correlation  between  the  ability  of  a 
polymerase  to  incorporate  the  dJ-base,  and  its  evolutionary  pedigree.  All  Family  B 


product.  In  contrast,  most  Family  A polymerases  produced  truncated  products  (Figure  5). 


polymerases  screened  were  able  to  use  the  dJ-bose  as  a substrate  and  create  full-length 


Thermophilic  polymerases  were  screened  by  PCR  to  determine  whether  or  not 
they  could  repeatedly  use  the  dJ-base  as  a substrate  for  the  purpose  of  amplification. 
From  these  experiments,  it  was  shown  that  Vent  was  the  best  polymerase  from  the 
several  thermophiles  screened.  dJTP  was  incorporated  in  a PCR  experiment  by  Vent 
polymerase  only  ca.  70%  as  efficiently  as  TTP  (Figure  6b),  as  judged  by  a comparison  of 
the  rate  of  increase  in  the  intensities  of  full  length  product  bands  on  an  ethidium  bromide 
stained  agarose  gel  (Figure  6a).  Polymerase  chain  reactions  that  lacked  cither  the  TTP  or 
dJTP  nucleotide  were  run  in  parallel  and  served  as  negative  controls.  In  both  cases,  no 
full-length  product  was  seen  after  eight  cycles  or  PCR  amplification. 

Determining  the  Kinetic  Behavior  and  Substrate  Modification 

The  kinetic  behavior  of  Vent  exo'  polymerase  accepting  dJTP  as  a substrate  was 

design  and  performed  as  described  by  Creighton  and  Goodman  (Creighton,  1995).  The 
extended  primer  and  its  products  were  quantitated  by  phosphorimaging  the  ':P  labeled 

The  amount  of  the  full  length  product  was  interpreted  using  the  equation:  1st/It- 
i^Vi-ndNTPl/K^+ldNTP],  where  Vm*.  is  the  relative  maximum  velocity  for 
incorporation  of  TTP  or  dJTP  at  the  target  site  A.  The  kinetic  data  were  plotted  assuming 
Michaelis-Mcnlen  behavior  (Figure  7).  The  results  are  collected  in  Table  1 . 


26 


Tbs  relative  apparent  V„  of  the  polymerase  with  dJTP  os  substrate  was  70%  that 
of  apparent  Vru,  for  TTP.  To  determine  whether  the  difference  in  apparent  V,„,  values 
could  be  attributed  to  cither  the  presence  of  the  positive  charge  from  the  amino  group  or 
the  mere  presence  of  the  side  chain,  analogous  kinetic  experiments  were  performed  with 
a dJTP  derivative  where  the  amino  group  was  protected  as  the  uncharged  phenylacetyl 
amide.  The  apparent  VM  of  the  protected  derivative  was  the  same  as  that  for  TTP;  the 
apparent  KM  of  the  protected  derivative  was  approximately  2.5  fold  higher  than  that  for 
TTP  restored.  Table  1 shows  the  kinetic  values  for  Vent  exo'  and  the  three  separate 
substrates;  TTP.  dJTP,  and  protected  dJTP  (Copeland,  1995). 

This  result  was  surprising,  as  the  side  chain  of  the  protected  form  of  dJTP  is 
bulkier  than  the  side  chain  of  dJTP  itself.  These  results  suggest  that  charge,  not  bulk, 
accounts  for  the  diminished  potential  of  dJTP  to  serve  as  a substrate  for  Vent  polymerase 
A Proposed  Mechanism  and  Crystal  Structure  Tor  the  Vent  Polymerase  and  J-basc 
interaction 

The  evident  correlation  between  the  ability  of  a polymerase  to  accept  dJTP  and  its 
evoluttonary  pcdtgtec  prompted  a closer  examination  of  the  crystal  structure  of  the 
polymerase  active  site  (Wang  el  al,  1997).  All  family  B polymerases  have  three  aspartate 
residues  in  the  active  site.  These  have  been  theorized  to  be  responsible  in  part  for 
catalysis  of  the  formation  of  phosphodicstcr  bond  (Blanco,  1995;  Wang  et  ah,  1997). 
Two  of  these  are  in  Region  I;  one  is  in  Region  II.  Most  of  the  Family  A polymerases, 
however,  have  only  one  aspartate  in  Region  1 and  one  in  Region  II. 


Figure  7:  Hyperbolic  curves  fit  to  the  kinetic  data  for  the  Vent  exo*  polymerase  with  TTP. 
dJTP,  and  PTP  (protected  or  neutralized  functionalized  base).  The  red  curve  represents 
functionalized  base,  the  green  represents  protected  JTP.  and  the  blue  represents  the 
nonfunctionalizcd  substrate,  TTP. 


Michaelis-Menten  for  Vent  exo'  Polymerase 


An  alignment  of  active  site  of  ihc  polymerases  used  in  this  projeol  and  olhcr 
polymerases  of  interest  are  shown  In  Figure  8-  The  firs!  12  polymerases  are  from  Family 
B polymerases;  Ihc  Iasi  1 1 are  from  Family  A.  The  sequences  were  obtained  from 
GenBank  and  an  initial  alignment  was  performed  by  CLUSTAL  W.  Because  of  low 
sequence  similarity  between  the  families,  the  alignment  in  the  reference  Wang  et  al., 
(1997)  was  used  to  guide  the  alignment  used  here. 

Inspection  of  the  crystal  structure  suggested  the  following  model  to  explain  the 
ability  of  Family  B polymerases  to  accept  dJTP,  and  the  inability  of  Family  A 
polymerases  to  do  so.  In  this  model,  the  third  aspartate  in  Region  I of  the  Family  B 
polymerases  is  expendable  for  catalysis.  It  can  therefore  interact  electrostatically  with  the 
positively  charged  amino  functional  group  of  dJTP.  This  interaction  occupies  the 
attentions  of  the  positive  charge  of  the  amino  group,  which  (so  occupied)  does  not  form  a 
coulombic  interaction  with  one  of  the  other  two  aspartate  residues,  both  of  which  are 
essential  for  catalysis  (Stcilz,  1998).  In  contrast,  in  Family  A polymerases,  there  is  no 
expendable  aspartate  to  interact  coulombically  with  the  ammonium  group  of  (UTP. 
Therefore,  the  ammonium  group  of  dJTP  interacts  with  one  of  the  critical  aspartates, 
inhibiting  catalysis. 

The  simple  prediction  of  this  model  is  that  Family  A polymerases  would  accept 
protected  dJTP.  A recent  report  by  Barbas  and  his  coworkers  shows  that  this  is  the  case 
for  a similar  substrate  3>pyridynyl-2-enone  deoxyuridinc  triphosphate  (Sakthlvcl  and 
Barbas,  1998). 


would  lower  (he  affinity  of  the  3’-OH  group  of  (he  primer,  allowing  the  oxygen  to 
perform  a nucleophilic  attack  on  the  alpha  phosphate  of  the  triphosphate.  The  metal  ion 
coordinated  to  aspartate  residue  545  would  facilitate  the  pyrophosphate  in  leaving. 
According  to  Stcitz,  the  two  metal  ions  stabilise  the  structure  and  charge  of  the  expected 


the  J-basc  interaction  in  the  active  site  of  the  Vent  polymerase  was  created  by  threading 
the  Vent  polymerase  sequence  to  a highly  homologous  and  recently  crystallized  family  B 
polymerase,  the  Thermococcus  gorganarius  (Tgo)  (Hopfncr,  1999),  see  Figure  9.  The 
Tgo  polymerase  has  over  an  80%  sequence  identity  to  the  Vent  polymerase  and  its  active 
site  for  catalysis  is  identical.  Sequence  alignment  and  identification  of  the  active  site  was 

CLUSTALW  (Thompson,  1994). 

As  indicated  by  the  proposed  structure,  the  functional  group  of  the  J-base  is  in 
closest  proximity  to  the  Asp  543.  An  electrostatic  interaction  is  highly  plausible.  The 


, or  specifically,  the  coordination  of  the  metal  ion.  It  is  visually 


Figure  9:  The  family  B active  site  of  catalysis  (or  catalytic  pocket)  showing  the  J-base 
functional  group  (purple)  in  close  proximity  to  the  aspartate  residues  (yellow).  The 
aspartate  residue  closest  in  proximity  to  the  J-base  amino  propynyl  arm  is  aspartate  543. 
The  green  molecules  are  primer  template  and  the  white  lines  represent  the  general 


Polymerase  Mutagenesis  and  Functional  Analysis 


To  test  the  hypothesis  that  the  "third"  Asp  residue  is  important,  if  not  essential, 
for  the  incorporation  of  dJTP,  mutations  in  the  active  site  of  Taq  polymerase  were 
introduced.  Tag  (Thermits  agualicus)  was  chosen  as  a representative  thermophilic 
polymerase  from  Family  A.  It  was  an  ideal  candidate  because  it  did  not  incorporate  the  J- 
basc  during  primer  extension  screening  and  a crystal  structure  of  Taq  existed  (Kim  et  ah, 
1995). 

In  the  first  mutant,  Taq  (V583D),  an  aspartate  was  introduced  at  the  position 
analogous  to  the  position  holding  the  third  aspartate  in  Vent.  The  second  Taq  mutant, 
Taq  (V583D  E786G),  was  a corresponding  double  mutation  where  Val  583  was  replaced 
by  an  aspartate,  and  Glu  786  by  a glycine. 

Effectively,  the  double  mutations  placed  in  the  active  site  of  Taq  the  conserved 
residues  from  the  Vent  active  site.  If  the  model  outlined  above  is  comet,  then  the  Taq 
polymerase  should  be  able  to  accept  dJTP.  In  the  single  mutant  variant  of  Taq,  an  extra 
negatively  charged  residue  is  present  in  the  active  site.  There  is  no  prior  literature  to 
generate  expectations  regarding  the  outcome  of  this  experiment. 

Running  start  experiments  were  done  using  both  the  Taq  (V583D)  and  Taq 
(V583D  E786G)  and  the  primer  template  combination  shown  in  Figures  10  and  1 1 over  a 
range  of  typical  triphosphate  concentrations.  The  Toq  (V583D)  single  mutant  was  not 
able  to  incorporate  either  dJTP  (beyond  a small  amount)  or  TTP.  The  Taq  (V583D 
E786G)  variant  was,  however,  able  to  incorporate  TTP  with  an  apparent  Vmax  only  ca.  2 
fold  less  than  that  for  TTP  displayed  by  the  wild  type  Taq  polymerase. 


s,  a simple  model  notes  that  the  Taq  (V583D)  variant  has 


two  anionic  groups  in  the  active  site  binding  pocket,  the  one  there  naturally  (Oly  786), 

either  position  583  or  position  786  generates  an  active  enzyme,  but  placing  anionic 
groups  at  both  destroys  activity  possibly  through  coulombic  repulsion. 

The  model  cannot  be  so  simple,  however,  as  the  Taq  (V583D)  polymerase  had 
sufficient  catalytic  activity  that  it  could  incorporate  two  guanosines  in  the  running  start 
experiment.  Thus,  the  two  negative  charges  in  the  the  same  active  site  evidently  do  not 


Interestingly,  the  double  mutant  did  show  a trace  amount  of  incorporation  of  dJTP 
at  high  concentration.  This  trace  was  not  observed  with  wild  type  Taq.  While  these 


results  require  fiirther  examination,  this  is  a finding  that  deserves  further  exploration. 


Figure  10.  Primer  extension  at  various  concentrations  of  target  nucleotide  TTP  and  dj  for 
purified  Taq  wild  type;  and  primer  extension  at  various  concentrations  of  dJ  for  Taq 

before  the  target  nucleotide  dJTP.  (A)  wild  type  Taq:  lane  I 10  pM  nrlTanc  2 60  |lM 
TTP.  lane  3 100  |tM  TTP.  lane  4 10  |lM  dJTP.  lane  5 60  |tM  dJTP.  lane  6 100  |lM  dJTP 
(B)  Taq  single  mutant:  lane  1 primer,  lane  20.1  pM  dJTP,  lane  3 I |XM  dJTP,  lane  4 2 
|tM  dJTP,  lane  5 3 pM  dJTP.  lane  6 7 pM  dJTP.  lane  7 14  |tM  dJTP.  lane  8 25  pM  dJTP. 
lane  9 40  pM  dJTP.  lane  10  60  pM  dJTP.  lane  1 1 100  pM  dJTP  (C)  Taq  double  mutant 
lane  1 primer,  lane  2 0. 1 pM  dJTP.  lane  3 1 pM  dJTP.  lane  4 2 pM  dJTP.  lane  5 3 pM 
dJTP.  lane  6 7 pM  dJTP.  lane  7 14  pM  dJTP,  lane  8 25  pM  dJTP.  lane  9 40  pM  dJTP. 
lane  10  60  pM  dJTP,  lane  11  100  pM  dJTP. 


A.  Taq  wild  type  TTP  and  dJTP 


B.  Taq  single  mutant  (V583D)  with  dJTP 


C.  Taq  double  mutant  (V583D  E786G)  with  dJTP 


ttt 


twice  before  the  target  nucleotide  dJTP.  (A)*Taq  single  mutant;  lane  1 primwjanc  2 0.1 
|lM  TTP,  lane  3 I (tM  TTP.  lane  4 2 pM  TTP.  lane  5 3 pM  TTP,  lane  6 7 |iM  TTP.  lane 
7 14  jiM  TTP,  lane  8 25  pM  TIT.  lane  9 40  nM  TTP.  lane  10  60  pM  TTP.  lane  1 1 100 
pM  TTP  <B)  Taq  double  mutant;  lane  1 primer,  lane  2 0. 1 pM  TTP.  lane  3 1 pM  TTP. 
lane  4 3 pM  TTP,  lane  5 14  pM  TTP.  lane  6 25  pM  TTP.  lane  7 40  pM  TTP.  lane  8 60 


D.  Taq  single  mutant  (V583D)  TTP 


E.  Taq  double  mutant  (V583D  E786G)  TTP 


ttt 


In  Vitro  Selection  of  Phosphodicstcr  Cleavage  by  DNA 

Parallel  SELEX™  experiments  were  performed  using  pools  of  modified  and  non- 
modiiicd  DNA  oligomets.  In  both  experiments,  novel  catalytic  DNA  molecules  were 
obtained  and  characterized.  The  selection  was  performed  as  described  in  the 
Background,  and  followed  as  closely  as  possible  die  procedure  reported  in  the  Breaker 
and  Joyce  experiments  (Breaker  and  Joyce,  1997). 

Both  pools  were  radiolabeled  by  incorporating  dAT”P  during  PCR  amplification. 
The  pools  were  immobilized  on  streptavidin-agarose  columns,  which  were  repeatedly 
washed  with  Tris  EDTA  buffer  at  pH  7.6  to  rid  the  columns  of  nonbinding  DNA 
molecules.  The  column  was  treated  with  NaOH  to  disrupt  duplex  DNA  structures,  and 
immediately  washed  several  times  with  reaction  buffer  without  divalent  cation  (pH  7.6). 
The  repeated  washings  with  the  neutralization  buffer  assured  that  the  column  was  fine  of 
unbiotinylated  DNA  and  radiolabeled  dATP.  The  single  stranded  DNA  immobilized  on 
the  column  was  then  incubated  with  reaction  buffer  for  periods  ranging  from  15  minutes 
to  2 hours.  The  eluted  single  stranded  DNA  was  collected  and  ethanol  precipitated. 
Parallel  test  PCRs  were  performed  to  determine  how  many  cycles  were  necessary  for 
each  pool  to  generate  a single  band  of  equal  intensity  on  agarose  gel.  Once  the  number  of 
cycles  was  established,  large  scale  preparative  PCRs  were  carried  out;  half  of  the  product 


having  no  catalytic  activity  remained  on  the  support,  the  radioactivity  eluting  from  the 


column  represented  those  DNA  molecules  that  had  catalytic  activity.  Both  the  standard 
and  dJ-containing  pools  were  PCR  amplified  with  alpha  labeled  dAT33P. 

The  progress  of  the  selection  was  followed  by  determining  the  amount  of 
radioactivity  eluting  under  the  selection  conditions  after  each  round.  In  total.  1 1 rounds  of 
selection  were  performed.  The  traction  of  the  library  released  by  self-catalysis  from  the 
support  after  each  selection  interval  began  to  increase  delectably  after  round  5 (Figure 
12).  This  increase  was  greater  for  the  functionalized  pool  than  for  the  nonftinctionalizcd 
pool  in  all  rounds  of  selection  (with  significant  DNA  elution).  Of  the  1 1 rounds  of 
selection,  7 rounds  were  done  under  conditions  favoring  random  mutation  via  error  prone 
PCR.  Error  prone  PCR  was  achieved  by  using  Vent  polymerase  without  proofreading 
function,  or  Vent  cxo'.  The  mutation  rate  was  approximately  0.02%  per  nucleotide  per 
round  of  selection  after  fifteen  rounds  of  PCR  amplification. 

Radiolabeled  PAGE  (polyacryllmidc  gel  electrophoresis)  showed  that  both  the  dj- 


A 3!P-Iabeled  primer  was  used  as  substrate,  the  substrate  was  incubated  with  DNA  pools 
for  2 hours  under  conditions  identical  to  those  used  for  the  selection,  and  the  appearance 
of  cleaved  product  was  followed  by  PAGE  (Figure  13).  The  multiple  bands  seen  in  the 
gel  for  the  cleaved  product  are  due  to  the  heterogeneous  length  of  the  substrate. 
Subsequent  analysis  with  a radiolabeled  ladder  indicated  that  the  site  of  cleavage  is  at  or 


Independent  amplification  of  die  pool  derived  from  the  dJ-containing  library,  but 
using  TTP  instead  of  dJTP,  gave  a mixture  that  lacked  catalytic  activity.  This  indicated 
that  catalysis  in  the  dJ-containing  pool  required  the  amino  acid-like  functionality. 


Figure  12:  Percent  cleaved  or  eluted  after  eleven  rounds  of  selection.  The 
blue  graph  represents  the 
functionalized  pool. 


Figure  13:  Radiolabeled  PAGE  showing  catalysis  i 


Experiments  were  done  with  both  pool  DNA  and  individual  sequences  to  test  the 


role  of  magnesium  cation  in  the  reaction.  No  detectable  cleaved  product  was  observed 
after  24  hours  when  the  reaction  buffer  was  devoid  of  divalent  magnesium  cation.  The 
mechanism  for  hydrolysis  of  the  phosphodicstcr  bond  may  involve  divalent  cation  as 
either  a general  base  or  Lewis  acid.  As  previously  hypodtesized  (Santoro.  1 998)  for  these 
RNA  cleaving  deoxyribozymes,  the  bases  themselves  rarely  participate  in  the  actual 
catalysis.  They  provide  instead  the  infrastructure  for  the  divalent  cation  to  perform  the 
catalysis.  The  presence  of  more  cleaved  product  for  the  dJ-containing  pool  in  the 
radiolabeled  gel  led  to  preliminary  indications  that  the  dJ-containing  pool  contained  a 


rounds  7 through  1 1 (Figure  14).  Figure  14  shows  percentage  of  pool  cleaved  versus  rate 
(min'1).  After  every  selection  round,  the  dJ-containing  pool  generates  more  product  than 
the  standard  base  pool.  These  experiments  were  repeated  and  the  behavior  of  these  curves 
are  mathematically  analyzed  in  later  sections  of  this  chapter. 


Figure  14:  In  vilro  selection  rounds  7-11.1 
pool  and  [he  blue  represents  the  nonfunctionalizcd  pool.  In  each  graph,  the  Y-axis  is  the 
percent  cleaved  and  the  X-axis  is  rate  (it  increases  to  the  right). 


Analysis  of  Sequences  from  Bolh  the  dJ-contuinlng  and  Standard  Base  Pools 


Sequence  Analysis 

Sequences  have  been  determined  for  each  pool  (functionalized  and 

Clark  for  both  die  dj-containing  and  the  non*modificd  SELEX  experiments  (Figure  15). 
Remarkably,  both  convergent  and  divergent  evolution  are  observed.  Specifically, 
molecules  that  are  clearly  non-homologous  to  each  other  converged  to  the  same 
phenotype,  or  perform  the  same  catalytic  function.  Molecules  that  have  homologous 
genotypes  but  vary  in  phenotypic  character,  or  perform  catalysis  to  varying  degrees,  give 

standard  base  pool  did  not  show  any  significant  sequence  homology.  When  substituting 
the  dl-containing  for  thymidine  in  the  standard  base  pool,  no  noticeable  cleavage  reaction 
occurred.  Boih  the  standard  and  J-basc  pools  were  chemically  synthesized  to  begin  the 

purchased  and  made  special  order  by  Integrated  DNA  Technologies.  The  distribution  of 
base  ratios  after  seven  rounds  of  selection  were  determined  and  given  in  Table  2. 

four  nucleotides  had  a ratio  l:l:l:l.  The  functionalized  nucleotide  distribution  showed  a 
**G"  rich  predilection  and  a "C"  poor  consuiuuon.  For  the  functionalized  pool,  the  “A" 
and  "dJ"  bases  did  not  fall  outside  of  one  standard  deviation  from  a 25%  composition  of 
the  total  nucleotide  population.  These  findings  suggest  that  the  secondary  and  tertiary 
structures  derived  from  guanosinc  nucleotides  may  play  an  important  role  in  catalysis  for 
the  dJ-basc  library.  It  is  cited  repeatedly  in  literature  that  guanosinc  is  abundant  in  DNA 


apiamcrs  (oligonucleotides  selected  for  binding)  (Huizenga,  1995;  Sassanfar.  1993). 
However,  the  guanosinc  rich  phenomenon  is  not  always  the  case  for  deoxyribozymes. 
Since  the  amino  acid  functional  group  is  also  essential  for  catalysis,  it  could  be  possible 
that  adding  functionality  also  confers  a greater  propensity  for  more  complex  tertiary 

Kinetic  analysis  of  specific  sequences  for  J-basc  and  nonmodified  pools 

Sequences  were  chemically  synthesized  and  tested  for  individual  rates  of 
catalysis.  Overall  seven  sequences  were  sampled  from  each  pool.  Four  sequences  from 
the  standard  base  pool  had  half  lives  recordable  within  a twenty-four  hour  time  interval. 


hour  time  interval  (Figure  15).  Each  sequence  exhibited  first  order  rate  behavior  and 

was  collected  either  by  scintillation  count  or  by  phosphorimaging  (Figurel6).  The  data 
was  then  fit  to  the  equation  y=x(l-exp(-fa]).  The  "y"  variable  is  the  fraction  reacted  at 
time  *V  The  “x”  variable  is  the  fraction  reacted  at  tune  infinity.  The  “k"  variable  is  the 
observed  rate  constant.  This  equation  can  also  be  transformed  so  that  the  data  can  fit  a 
linear  plot  and  where  the  kobs  value  can  be  attained  from  the  slope,  ln(x/x-y)=kobs*t. 

For  each  sequence,  a hyperbolic  curve  was  fit  to  the  kinetic  data  by  using  least 
squares  approximation.  The  catalysts  were  immobilized  on  a streptnvidin  support  and 
incubated  (23  “C,  50  mM  HEPES.  1 M NaCI,  1 mM  MgCIo.  0.02  mM  EDTA).  Nineteen 


found  to  have  up  to  50%  variation.  However,  all  observed 


Figure  15:  Unrooted  neighbor  joining  irees  using  CLUSTAL  W and  Tree  View  after  7 
rounds  of  selection.  The  length  of  the  branches  are  proportional  to  their  estimated 
divergence.  (A)  functionalized  pool  (B)  nonfunctionalizcd  pool. 


Average  0.22702  0.15337  0.34527  0.23108  0.28515  0.20279  0,25252  0.24864 
Standard  0.03247  0.06926  0.05585  0.04579  0.05276  0.04503  0.03276  0.04554 


Figure  16:  Funcuonanreo  sequences (s79.s7i.7b.3729.j74i) ; noniunctionaliicd 
sequences  (»62 . s6i4b.  s6i5b.  s6i6);  (rA)  stands  for  ribonucleotide. 


j79 

CTGC AGAATTCT  AATACGACTCACTAT  ( rA)  GGAAGACATGGCGACTCTCAGGGG JACAJJ 
GJGGJJAAAC-GJCJGGJACGCCAGAJGJGGJGACGGJAAGCJJGGCAC 


CTGCAGAATTCTAATACGACTCACTAT  ( rA)  GGAAGACATGGCGACTCTCGJJGGGCCGCC 
AGJCCGGJJAAAGGCAJGGJACGJJCGAJGJGACGGJAAGCJJGGCAC 


CTGCAGAATTCTAATACGACTCACTAT  (rA)GGAAGACATGGCGACTCTCAGGGGJACAJJ 
GCGGJCAAAGGJCJGGJACGCCAGAJGJGGTGACGGJAAGCJJGGCAC 

j741 

CTGCAGAATTCTAATACGACTCACTAT  ( rA)  GGAAGACATGGCGACTCTCGGCAJGGGGGG 
JAAAAGGCAJACGGGJJACJGAAGJJAJAGJGACGGJAAGCJJGGCAC 


CTGCAGAATTCTAATACGACTCACTAT  ( rA)  GGAAGACATGGCGACTCTCACACGCACGGA 
CTCGCACGTATATAGCGTAAGGTTGATAGTGACGGTAAGCTTGGCAC 


CTGCAGAATTCTAATACGACTCACTAT  { rA)  GGAAG  ACATGGCGACTCTC  ACTGC  ACAATC 
CAACACCGATTGCTGCAAAGGTTGTTAGGGTGACGGTAAGCTTGGCAC 


i DNAzymc  exhibiting  cleavage  of  the 


Single  lomovcr  conditions  can  done  intramolecularly  as  opposed  to 
intermolccularly  because  the  rate  of  enzyme  substrate  association,  or  the  association  of 
two  complementary  oligonucleotides,  is  several  orders  of  magnitude  faster  than  the  rate 
of  catalysis  (Santoro,  1998).  Thus  the  rate  of  cleavage  should  be  the  same  for  an 
intramolecular  cleavage  as  it  is  for  an  micrmolccular  one.  These  methods  for  obtaining 
kjH  have  been  previously  reported  (Gcycr  and  Sen,  1997).  To  determine  whether  or  not 
these  deoxyribozymes  could  exhibit  enzymatic  behavior,  the  pool  of  deoxyribozymes 
were  reacted  to  an  excess  of  radiolabeled  substrate.  Over  time,  cleaved  product 
accumulated  as  expected  (Figure  10). 


Secondary  structural  analvsis 

The  secondary  structural  analysis  was  performed  by  submitting  known  sequences 
of  catalytically  active  DNA  molecules  to  an  online  secondary  structure  prediction 
program  maintained  by  Dr.  Michael  Zukcr  at  the  University  of  Washington, 
(http://mfold.wustl.edu).  The  software  implements  energy  rules  developed  by  SantaLucia 
(SantaLucia,  1998).  These  energy  rules  determine  the  energetically  optimal  folded 
structure  of  DNA  from  a library  of  DNA  duplexes  with  different  conformations.  The 
SantaLucia  method  uses  nearest-neighbor  thermodynamics  and  presumes  that  the 
stability  of  a base  pair  is  reliant  on  the  identity  os  well  as  the  orientation  of  its 
neighboring  bases.  A few  secondary  structures  arc  represented  in  Figure  1 7 (rom  both  the 
functionalized  (endowed)  and  nonftinctionaiizcd  (nonendowed)  selections . 


The  secondary  structures,  for  both  the  functionalized  and  nonfunctionali/cd 


catalysis,  exhibit  a bulge  where  the  site  of  cleavage  should  take  place.  In  the 
functionalized  sequences,  it  is  worth  noting  that  there  is  a high  guanosine  nuclcobase 
composition  in  the  bulge.  Many  of  these  deoxyriboaymes  also  possess  a “pistol-like" 
secondary  structure.  These  motifs  have  been  previously  reported  in  deoxyriboaymes 
capable  of  hydrolyzing  the  phosphodiester  backbone  of  RNA  (Breaker,  1997). 

The  Quantitative  Evaluation  of  the  In  Vitro  Selection:  New  Mathematical  Models 


tre  at  present  no 

mathematical  formalisms  to  quantity  DNA  pool  potential  for  catalysis  or  mathematical 

experiments.  Although  in  vitro  selection  has  created  alternative  roles  for  genetic 
information,  little  work  has  been  done  to  answer  the  general  questions  about  the 
disiriouuon  of  these  selectable  properties  (either  binding  or  catalysis)  within  a set  of 
random  nucleic  acid  scqucncestSzosiak.  1988).  This  distribution  is  crucial  for  our 

receptora  and  catalysts.  It  is  also  important  for  assessing  the  role  of  in  vitro  sclcction-likc 
processes  on  the  origins  of  life  on  Earth  or  in  the  universe.  By  adding  protein-like 
functionality  to  DNA  and  quantifying  its  intrinsic  improvement  in  performing  catalysis 
over  nonfunctionalizcd  DNA,  we  have  for  the  first  time  been  able  to  answer  these 


Fitting  the  Data  to  both  Theoretical  and  Statistical  Models 

To  begin  the  quantitative  analysis,  a probability  function  is  derived  that  relates  the 
frequency  of  a selectable  property  (for  example,  the  rate  constant  for  a reaction  to  be 


catalyzed)  in  a population  to  the  magnitude  of  the  rate  constant.  This  function  should  be 


poor  catalyst.  Further,  the  definite  integral  from  aero  (molecules  having  no  catalytic 
power  at  all)  to  infinity  (molecules  having  outstanding  catalytic  power)  must  be  bounded 

A variety  of  functions  have  these  properties.  For  example,  an  exponential  probability 


P(h)  - ocxp(-M),  with  h being  a constant  that  describes  the  intrinsic  limitation  of  a 
biopolymer  to  generate  selectable  behavior.  Different  biopolymers,  as  will  be  seen  below. 

value  for  ft,  the  faster  the  function  falls  towards  zero,  and  this  means  that  there  will  be  more 


biopolymcr  (Figure  18a). 


sample  members  of  a library  of  random  sequences,  measure  the  rale  constants  for  each 
member  sampled,  and  plot  the  distribution.  In  practice,  however,  this  procedure  will  not  be 
successful  with  DNA  or  RNA  libraries.  In  these  biopolymers,  catalysis  is  rare,  and 
inefficient  when  it  does  occur.  Any  reasonable  sample  size  identifies  no  molecules  with 
detectable  catalytic  behavior.  We  can.  of  course,  select  from  the  population  those  molecules 
that  have  catalytic  activity,  and  amplify  them  by  the  polymerase  chain  reaction.  In  this 
research,  the  selection  system  is  arranged  so  that  by  catalyzing  the  cleavage,  the 
catalytically  active  molecule  is  released  from  a solid  support.  If  the  selection  is  run  for  time 


'•  <*“»  half  of  Ihe  molecules  catalyzing  the  reaction  with  a half  life  of  r will  be  recovered  (ft 
= \nVl).  A larger  fraction  of  the  catalysts  with  a larger  ft  and  a smaller  fraction  of  catalysts 
with  a smaller  ft  will  also  be  recovered  in  the  selection.  Inactive  molecules  remain  bound  to 
the  support. 

An  analytical  transformation  can  be  used  to  describe  the  selection  step.  Specifically, 
the  selection  step  multiplies  the  initial  P(ft)  probability  distribution  function  by  a second 
function,  the  selection  function,  Q(ft).  For  the  Breaker -Joyce  experiment,  it  is  readily  shown 
that  the  function  is  analytical.  Q(ft)  = (1  - exp(-*r ))  (Figure  18b).  This  yields  a new 
probability  distribution,  F(ft)  = P(ft)-Q(t)  (figure  18c).  After  the  selection,  amplification 

presumed  not  to  alter  the  distribution  subsequent  selection  steps  correspond  to  a 
multiplication  of  the  new  probability  distribution  function  again  by  "transformation 
function".  (I  - exp(-ir )),  to  give  yet  a new  probability.  In  this  example,  after  n rounds  of 
selection,  the  probability  distribution  should  have  the  form  oexpf-M  )(l-exp(-ft/  )f . These 
mathematical  transformations  can  be  applied  in  the  inverse.  After  a sufficient  number  of 
selection  steps,  poor  catalysts  will  eventually  be  depleted  enough  to  allow  detection  of 
catalytically  active  oligonucleotides.  At  this  point,  sampling  the  population  encounters 
individuals  with  measurable  catalytic  power  a reasonable  fraction  of  the  time,  and  a 
probability  distribution  can  be  constructed  by  experiment.  A function  representing  this 
distribution  can  then  be  divided  by  the  selection  function  n times  to  generate  an 
approximation  of  the  original  probability  distribution. 

The  analysis  described  above  can  now  be  used  to  describe  the  data  from  the 
selection.  The  probability  distributions  for  catalysts  in  the  two  selection  product  pools  after 


round  eight  was  fitted  by  a least  squares  procedure  to  the  "selection  equation."  acxp(-M)(l- 
exp Figure  19.  Although  all  data  from  the  different  rounds  of  selection  fit  this 
distribution,  round  eight  was  divisible  by  the  most  transformation  functions  (six  times  for 
the  J-pool  and  four  times  for  the  nonfunctionalizcd  pool).  These  yielded  an  intrinsic 
potential  value  of  4.9  and  8.9  for  the  J<containing  and  standard  pools,  respectively.  To 
generate  the  probability  distribution  P (k),  this  function  was  then  divided  by  (he 
transformation  function  ri  times.  These  functions  are  shown  in  Figure  20b. 


Figure  19.  Probability  of  finding  analysts  os  a function  of  rate.  C is  inversely  proportional 
as  the  intrinsic  potential  for  catalysis,  t is  the  rate  constant  for  the  catalytic  reaction,  and  ( is 
the  time  allowed  by  the  selection  procedure  for  catalysis  to  occur.  (A)  the  distribution  of 
catalysts  before  selection  (B)  the  selection  function  (C)  the  new  distribution  of  catalysts 
after  the  initial  distribution  oi  catalysts  are  transformed  bv  the  selection  function. 


P(k):= 


Figure  20:  Round  S functionalized  (red)  and  nonfunctionalized  (blue)  dam  fit  to  the 
equation  ftexp(-M)(l-oxp(*&Q). 


Figure  21:  (A)  Round  eight  selection  functionalized  (ted)  and  nonfunctionalized  (blue)  fit 
to  the  equation  Cexp(-C*)<  1 -cxpC-*!))".  (B)  Round  eight  selection  functionalized  (red)  and 
nonfunctiunauzcd  tbiuej  divided  n times  by  the  transformation  function  l -exp  (-ki). 


AI  the  outset,  an  exponential  probability  distribution  was  assumed  for  P(k)  as  well 


as  F(A)  (Figures  19a  and  19c).  This  assumption  was  arbitrary.  Should  we  wish  to  relax 


numerical  function,  and  transform  this  using  the  selection  function  to  yield  a numerical 
representation  of  the  initial  probability  distribution  that  need  not  be  exponential.  To 
illustrate  this,  the  probability  distribution  data  for  catalysis  in  the  two  selection  product 
pools  after  rounds  seven  through  eleven  was  fitted  by  a to  a 


Wcibull  distribution  (Weibull,  1951),  Figure  21. 

The  Weibull  distribution  is  frequently  used  in  statistics  as  a generic  function  for 
representing  data  requiring  the  simultaneous  performance  of  several  components.  In  this 
example,  catalysis  require  the  simultaneous  action  of  several  different  functional  groups 


from  the  Weibull  fit  to  the  selection  rounds,  that  the  functionalized  pools  possess  faster 
catalysts  at  each  round  of  selection.  At  rounds  ten  and  eleven,  the  selection  pressure  was 
tightened  to  five  minutes,  as  opposed  to  two  hours.  A noticeable  shift  to  the  right  in  both 

“selection  function, " the  Weibull  distribution  for  round  eight  was  divided  n times  by  the 
transformation  function  to  yield  the  probability  distribution  of  the  initial  population 
(Figure  22d).  The  overall  appearance  of  P(*)Weibull  is  not  markedly  different  ftom 
P(*)exponemial;  it  is  different  theoretically,  of  course,  at  the  limits  of  extremely  fast  and 
extremely  slow  catalysts,  but  these  differences  do  not  fall  outside  of  the  ctror  in  the 


this  assumption,  we  can  fit  the  empirical  data  obtained  after  n rounds  of  selection  to  any 


> multiple  binding  interactions.  It  is  apparent 


Figure  22.  Rounds  7- 1 1 fit  to  a four-parameter  Wcibull  distribution.  The  red  line 
represents  the  functionalized  fit  and  blue  represents  iionfunciionalizcd.  (A)  round  7,  (B) 
round  8,  (C)  round  9.  (D)  round  10,  and  (E)  round  1 1. 


* Hi 


Quantifying  Improvement  Using  the  Selection  Equation 

selection  equation  and  a statistical  fit  by  the  Weibull  distribution  have  both  shot™  that 
the  initial  distribution  profiles  for  functionalized  DNA  possess  better  catalysts.  Several 
other  qualitative  indications  also  lean  towards  evidence  that  a functionalized  DNA  pool  is 
intrinsically  better.  For  example,  although  the  none  of  the  data  from  the  selection  rounds 
were  able  to  fit  the  selection  equation,  the  functionalized  pool  consistently  fit  the 
selection  equation  with  a higher  number  of  transformation  functions  (l-cxp'1*).  This  may 

rounds  of  selection.  There  could  have  been  more  faster  catalysts  in  the  functionalized 
pool.  However,  this  was  not  obvious  from  our  empirical  data.  The  detectable  elution  of 
catalysts  from  both  the  functionalized  DNA  and  nonfunctionalizcd  DNA  occurred  at  the 
same  lime.  But  a higher  percentage  of  catalysts  were  consistently  eluted  from  the 
functionalized  pool  (Figure  12)  and  greater  number  of  catalysis  could  be  the  reason. 

The  average  rates  among  the  separate  pools  after  each  round  of  selection  also 
yielded  some  qualitative  data  (table  4).  Not  only  was  the  average  rate  better  for  the 
functionalized  pool  in  each  round  of  selection;  but  the  slowest  observed  rate  for  the 
functionalized  pool  (0.0067)  was  still  faster  than  the  fastest  observed  rate  for  the 
nonfunctionalized  pool  (0.0027)  pool.  This  is  no  expected  since  the  selection  pressure 
for  rounds  ten  and  eleven  were  tightened  to  five  minutes  as  opposed  to  two  hours. 

Because  we  have  defined  the  intrinsic  potential  to  mean  an  improvement  in  the 
probability  of  finding  catalysts,  a direct  correlation  can  be  made  between  the  average  rale 
found  at  a selection  round  and  the  intrinsic  potentials  derived  from  that  selection  round. 


Theoretically  wc  can  make  this  quantitative  statement  here,  but  unfortunately,  the  failure 
of  the  data  to  be  fit  by  an  equal  magnitude  to  its  selection  round  makes  this  quantitative 


statement  analytically  improbable. 

affected  by  the  variable  in  front  of  the  exponent,  otherwise  known  as  the  carrying 
capacity.  If  the  probability  distribution  for  an  exponential  function  is  set  to  unity,  then 
by  definition,  both  constants  must  be  the  same  (Nelson.  1982).  On  the  other  hand,  it  is 
unreasonable  to  assume  that  the  behavior  of  this  system  can  be  defined  by  only  one 
variable.  That  is,  chemical  reactions  involve  a scries  of  different  interactions  that  can  be 


Still,  the  modified  “selection  equation"  equation 


does  fit  all  the  data  for  ail  the  selection  rounds,  and  this  statement  though  weak, : 
holds  some  qualitative  (but  not  quantitative)  validity. 


3.  Quantifying  Improvement  Using  the  Weibull  Distribution 

A better  way  of  quantifying  improvement  in  the  intrinsic  potential  for  a 
functionalized  library  to  generate  catalysts  makes  reference  to  the  Weibull  distributions. 
The  Weibull  distribution  is  ideal  because  it  uses  several  parameters.  These  appear  to  be 
needed  to  capture  the  features  of  the  probability-catalyst  distribution.  The  data  from  each 
round  of  selection  fit  a Weibull  distribution  with  high  R-squarcd  values. 

Dividing  the  Weibull  distributions  describing  the  data  from  the  functionalised 
library  by  the  Weibull  describing  data  from  the  nonfunctionalizcd  library  in  round  7 
generates  a quotient  curve,  shown  in  Figure  24.  The  nonfunctiooalized  pool  has  more 
poor  catalysts  than  the  functionalised  pool.  Because  we  are  dividing  the  functionalised 
Weibull  (numerator)  by  the  nonfunctionalizcd  Weibull  (denominator),  the  ratio  should  be 
less  than  one  for  low  values  of  k„*.  For  faster  catalysts,  the  denominator  becomes 


ve  approaches  unity.  For  still  faster 
catalysts,  the  functionalised  Weibull  is  larger  than  the  unfunctionalized  Weibull,  and  the 
ratio  becomes  greater  than  unity  and  continues  to  increase. 

The  quantitative  improvement  of  functionalisation  can  be  measured  by  looking  at 
quotient  curve  or  divisitory  product  (curve)  from  round  seven.  Round  seven  is  ideal 
because  it  is  the  first  round  of  selection  in  which  catalytic  DNA  appears.  We  can 
arbitrarily  take  the  mode  (the  peak  of  the  quotient  curve)  for  Round  7,  or  any  value  (rate) 
in  the  portion  of  the  distribution  greater  than  one.  We  can  then  detetmine  a probability 
of  finding  this  rate,  say  X,  for  both  the  functionalized  and  nonfuncttonaitsed  distributions. 
For  the  nonfuncuonalised  distribution  we  can  estimate  a probability  Y,  of  finding  this 
rate.  We  can  do  the  same  for  the  functionalized  distribution  and  estimate  a probability.  Z 


Willi  these  probabilities,  a quantitative  statement  can  be  made  where  a rate  of  X i sZ/Y 
times  more  likely  to  be  found  in  a functionalized  pool  than  a nonfunctionaiizcd  pool. 

The  probability  of  find  a catalyst  with  a k,lh,  greater  than  E is  defined  as  P(E)  = 
Ib  f(y)dy.  where  fly)  is  the  distribution  function.  To  determine  the  probability  of  finding 


and  faster,  the  reliability  function.  R(y)  = P|  Y>yJ  = 1 - F(y),  would  be  used.  For  the 
Wcibull  distribution.  F(y)  is  defined  as  I -cxpKy/a)9].  where  a is  the  scale  parameter  and 
P is  the  shape  parameter.  This  applies  to  any  parameter  Weibull  distribution  (Nelson. 
1982;  Johnson.  1994). 

Before  drawing  conclusions  about  the  improvement  in  the  catalytic  potential  of  a 
DNA  library  arising  through  functionalization,  the  raw  data  must  be  put  through  a 
rigorous  statistical  test  to  assess  the  possibility  of  differences  due  to  stochastic 
uncertainty.  One  well-known  method  of  statistical  testing  involves  calculating  the 
average  rates  and  their  respective  95%  confidence  intervals.  If  the  confidence  intervals 
do  not  overlap,  then  the  averages  arc  statistically  significant.  At  round  seven,  an  average 
rate  of  0.0014/min  with  a 95%  confidence  interval  of  0.001 1 was  calculated  for  the 
functionalized  pool,  and  an  average  rate  of  0.00049/min  with  a 95%  confidence  interval 
of  0.00029/min  was  calculated  for  the  nonfunctionaiizcd  pool.  Clearly,  the  average  rates 


all  the  catalysis  up  to  rate  X.  we  use  the  cumulative  distribution  (unction.  Fly)  = P(Y<y|. 


p.  This  indicates  that  the  difference  between 


' say  with  some  confidence  that  a quantitative  estimation  of  improving  the  probability 


in  finding  catalysis  (or  any  subseqneni  transfoimaiion  of  this  data)  will  be  statistically 
significant. 

At  round  seven,  the  mode  of  the  quotient  curve  was  found  to  be  0.0047/min. 
Using  the  reliability  function  R(y),  the  probability  of  finding  a catalyst  with  a rate  of 
0.0047  or  faster  for  the  functionalized  pool  at  round  seven  was  0.57  (Figure  25a).  The 
probability  of  finding  the  same  catalyst  or  taster  in  the  nonfunctionalized  pool  was  only 
0.42  (Figure  25b).  The  following  quantitative  statement  can  now  be  made.  The 
tunctionalizcd  pool  is  1 .36  times  more  likely  to  find  a catalyst  with  a rate  of  0.0047/min 
or  faster  than  the  nonfunctionalized  pool. 


7 Quotient  Curve:  Dividing  Endowed  by  Nonendowed  Weibull  Distributions 


Log(kobs) 


Figure  25.  The  reliability  function  R(y)  calculated  for  the  functionalized  pool  and 

c7=l.8707.  *70=0.7717.  and  x=0.828.  For  the  nonfunctionalized  pool;  b71=O.I838. 
c7 1=1,639,  x70l=0.6093,  and  x701=0.6093. 


= 0.423 


Evaluating  the  Affect  of  the  J-basc 


As  an  alternative  way  to  determine  how  the  presence  of  the  J-basc  contributed  to 

20).  The  ratio  of  the  slopes  between  endowed  pool  over  the  nonendowed  DNA  pool  was 
approximately  i.S.  The  slope  of  the  line  describing  catalysis  by  the  functionalized  pool 
was  0.4;  the  corresponding  slope  was  0.6  for  the  nonftmctionalized  pool.  The  slopes  may 
bo  used  to  estimate  the  number  of  positions  in  the  nucleotide  sequence  that  must  be 
defined  to  obtain  a factor  of  10  increase  in  the  rate  of  catalysis.  Specifically,  to  improve 
the  rate  of  catalysis  for  the  nonmodified  pool  by  a factor  of  1000,  three  additional 
positions  in  the  nucleotide  sequence  must  be  defined.  In  contrast,  two  nucleotides  in  the 
sequence  must  be  defined  to  improve  the  rate  of  catalysis  in  the  iunctionalized  DNA  pool 
by  a factor  of  1000. 

Analysis  of  variance  was  determined  from  the  two  sets  of  data  using  the  software 
Statistical  Analysis  Software  (SAS).  This  suggested  that  the  differences  in  the  curves  is 
statistically  significant.  A p-valuc  of  0.0023  was  derived,  thus  rejecting  the  null 


Functionalized  vs.  Standard  Base  Pool  Catalysts 


The  previous  Wcibull  distribution  analyses  csdmated  die  improvement  of  the 
functionalized  library  over  the  nontimetionalizcd  library  by  comparing  data  at  round 
seven  without  extrapolation  back  to  round  zero.  This  estimation  is  adequate  because  any 
transformation  of  these  data  would  still  reflect  the  properties  of  the  distribution  for  that 
particular  round,  whether  it  is  mutation,  selection  pressure,  or  experimental  conditions. 

approximation.  We  can  also  visually  appreciate  the  initial  distribution  of  catalysts  for 

improvement  by  extrapolating  back  to  round  zero  and  comparing  directly  from  round 
seven  will  give  the  same  value  of  improvement  for  the  functionalized  library. 

To  obtain  these  initial  distributions,  die  Wcibuil  functions  of  round  seven  for  both 
the  functionalized  and  nonfunctionalizcd  libraries  were  divided  seven  limes  by  the 
transformation  function.  Q(k)  = I - exp  (-kt)  (figure  28).  The  distributions  revealed  that 

die  nonfUnctionalizcd  data.  Specifically,  the  functionalized  pool  is  1.4  times  more  likely 
to  find  catalysts  with  rate  constants  0.001  per  minute  than  the  nonfunctionalizcd  pool. 
The  fastest  catalysts,  according  to  these  distributions,  arc  0.06  per  minute  for  the 
functionalized  pool  and  0.03  per  day  for  the  nonfunctional  ized  pool. 


figure  ZS;  Wcibull  tunctions  ot  round  seven  for  both  the  functionalized  and 
nonfuncnonalized  libraries  divided  seven  times  by  the  transformation  (unction,  Q(k) 
-exp(-kt) 


Round  Zero  Probability  Distribution  of  Catalysts 
Functionalized  vs.  Nonfunctionalized 


log  (rate  constant) 


Pages 

Missing 

or 

Unavailable 


MATERIALS  AND  METHODS 


Integrated  DNA  Technologies,  Inc.  and  PAGE  purified.  The  starting  pool,  5*- 
GTGCCAAGCTTACCGTCC-N40- 

AGATGTCGCCATCTCTTCCTATAGTGAGTCGTATTAG-3'  was  ordered  to  have  the 


5'CTGCAGAATTCTAATACGACTCACTATrAGGAAGACATGGCGACTCTC-3' 

streptavidin  column  for  selection  or  it  was  3*  gamma-ATJiP  labeled  for  PAGE  analysis. 
Primer  3. 5'-GTGCCAAGCTTACCGTCAC-3'  was  either  not  modified  or  it  was  in  the 


DNA  Pool  Preparation 


prepared  by  PCR  using  die  80  pmoles  of  starting  pool.  The  PCR  reaction  was  done 
with  0.5  units/pL  of  Veni  polymerase  (New  England  Biolabs,  Inc.)  in  die  presence  of 
10  mM  KCI.  10  mM  <NH,),SO„  20  mM  Tris-HCI  (pH  B.8),  2 mM  MgSO„  0.1% 
Triton  X-100. 60  pmoles  of  primer  I (biolinylaicd),  and  60  pmoles  of  primer  2 with 
no  modification,  and  0.25  mmoles  of  each  dNTP.  Thymidine  triphosphate  was  used 
only  for  the  standard  base  selection  and  JTP  was  only  used  for  die  J-base  selection,  at 
no  time  was  either  nucleotide  triphosphate  interchanged  during  selection.  The  PCR 
was  cycled  at  94°C  for  2 minutes,  55°C  for  2 minutes,  then  72"C  for  4 minutes  for 
eight  cycles.  Each  PCR  reaction  was  done  in  200  pL.  The  PCR  product  was  then 
phenol-chloroform  extracted  twice  and  then  ethanol  precipitated  with  glycogen  os  a 
earner.  The  ethanol  precipitation  was  carried  overnight  at  -80”C  and  subsequently 
centrifuged  at  4°C  at  14,000  RPMs  for  20  minutes.  The  supernatant  was  decanted 
and  70%  ethanol  was  added  to  each  sample.  Tile  sample  was  then  centrifuged  for  2 

placed  in  the  Eppcndorf  speed  vacuum  for  45  minutes.  When  dry,  the  samples  were 
stored  at  4°C  until  selection  was  ready  to  be  performed. 


Each  of  the  two  starting  pools  was  redissolvcd  in  200  pL  of  Tris-EDTA 
(iOmM  Tris  HC1, 1 mM  EDTA)  (pH  7.6).  One  hundred  microliters  was  stored  at  - 
20°C  and  the  rest  were  used  for  selection.  Two  15  mL  Bio  Rad  chromatography 


polystyrene  columns  were  labeled  and  200  pL  of  strepuividin  (Fluka  Biochemicals) 
were  loaded  lo  each  column.  Each  column  was  washed  with  1 mL  of  Tris-EDTA 
buffer  (lOmM  Tris  HC1.  1 mM  EDTA)  (pH  7.6)  and  capped.  The  rcdissolved  pool 
was  added  lo  the  column  and  gently  agitated  to  resuspend  the  sueptavidin  with  the 
pool  mixture.  The  column  and  pool  were  incubated  at  23°C  for  20  minutes.  Tire 
column  was  then  washed  10  times  with  TE  (pH  7.6).  followed  by  3 volumes  of  0.2  M 
NaOH,  then  neutralized  with  8 column  volumes  of  Buffer  A (50  mM  HEPES.  1 M 
Nad.  pH  7.6).  One  column  volume  of  reaction  buffer  or  Buffer  B (50  mM  HEPES. 
1M  NaCI.  1 mM  MgClj,  0.02  mM  EDTA)  was  added  to  equilibrate  the  immobilized 
DNA  with  the  divalent  cation.  The  single  stranded  DNA  was  eluted  for  20  minutes, 
then  40  minutes,  then  2 hours  with  200  |tL  of  Buffer  B after  each  successive 
incubation.  The  eluted  filtrate  was  ethanol  precipitated.  3 M ammonium  acetate,  and 
glycogen  as  a carrier  at  -SOX'  overnight.  The  ethanol  precipitation  was  ccntritugcd  at 
4”C  at  14.000  revolutions  per  minute  (RPMs)  or  12.500g  for  20  minutes.  The 
supernatant  was  decanted  and  70%  ethanol  was  added  to  each  sample.  The  sample 
was  then  centrifuged  for  2 minutes  at  14,000  RPMs  at  4°C.  The  supernatant  was 
decanted  and  the  samples  were  placed  in  the  Eppendorf  speed  vacuum  for  45  minutes. 

The  samples  were  resuspended  in  100  (tL  of  double  distilled  water  (Millipore 
Systems).  Test  PCR  reactions  using  5 pL  of  resuspended  sample  added  to  50  pi.  of 
PCR  mix  (60  pmoles  primer  I and  primer  2, 0.25  mmoles  dNTP,  2 mmoles  MgSO„ 
10X  buffer)  and  0.5  units/pL  of  Vent  DNA  polymerase  were  done  until  a single  band 
could  be  visualized  on  4%  agarose  gel.  Usually  15-20  cycles  were  done.  Using  the 
same  conditions,  large  scale  PCRs  were  done.  Four  of  the  PCR  reactions  for  the 


large-scale  amplifications  were  radiolabeled  with  alpha  dAT“P.  The  PCR  product 
was  then  phenol-chloroform  extracted  twice  and  then  ethanol  precipitated  with 
glycogen  as  a carrier.  The  ethanol  precipitation  was  centrifuged  at  4°C  at  14,000 
RPMs  (I2,500g)  for  20  minutes.  The  supernatant  was  decanted  and  70%  ethanol  was 
added  to  each  sample.  The  sample  was  then  centrifuged  for  2 minutes  at  14,000 
RPMs  (12,SOOg)  at  4°C.  The  supernatant  was  decanted  and  the  samples  were  placed 
in  the  Eppcndorf  speed  vacuum  for  45  minutes. 

Rounds  1-4  were  performed  as  above,  rounds  5-12  were  done  using  0.5 
units/pL  of  Vent  exo-  polymerase  to  promote  evolution  via  error  prone  PCR.  Rounds 
10-12  were  collected  after  5 minutes. 

Cloning  and  Sequencing 

For  subcloning  and  sequence  analysis,  molecules  eluted  at  the  end  of  rounds 
five,  six.  and  seven  selection  were  amplified  by  PCR  using  Taq  DNA  polymerase  to 
reincorponued  standard  TTP  dcoxynucleotides  instead  of  JTP  deoxynuclcotidcs  in  the 
functionalized  pool.  PCR  was  performed  under  the  following  conditions:  100  pL 
PCR  reactions  (60  pmolcs  primer  1, 60  pmolcs  primer2, 0.25  mmoles  dNTPs.  50  mM 
Tris-HCI,  100  mM  NaCI,  0,1  mM  EDTA,  1 mM  DTT,  50%  glycerol  and  1%  Triton, 
mM  MgClj);  8-15  cycles  of  I minute  at  94°C.  2 minutes  at  55°C,  and  2 minutes  at 
72°C,  with  a final  extension  for  8 minutes  at  72°C.  The  final  extension  allowed  Taq 
to  add  a single  nontemplatcd  deoxyadenosine  for  cloning  into  a vector  with  a 
deoxythymidine  3'  overhang.  The  pCR  2.1-TOPO  vector  (Invitrogen),  a PUC  18 
derivative,  was  used  for  cloning.  After  a five  minute  ligation,  competent  E.  coli  strain 


105 


TOP  10  (Invitrogen)  was  transformed  as  described  by  the  supplier  and  plated  onto  LB 
plates  with  kanamycin.  IPTG  was  used  for  blue-white  screening. 

While  colonics  were  randomly  chosen  and  streaked  onto  LB  plates  with 
kanamycin.  Plasmid  purification  was  performed  by  traditional  alkaline  iysis  methods 
or  by  using  a Qiagcn  kit.  Sequencing  reactions  used  the  Big  Dye  Terminator 
(Applied  Biosystems,  Inc.).  PCR  conditions  for  the  sequencing  reaction  were  as 
follows:  25  cycles  of  30  s at  96°C.  15  s at  50°C  and  4 minutes  at  60“C.  PCR 
products  were  held  at  4°C  and  then  applied  to  handmade  Sephndcx  G-50  spin 
columns  made  from  0.5  mL  eppendorf  tubes  and  1/8  inch  polyethylene  filter  material 
(Pores).  Purified  plasmids  were  submitted  as  templates  to  a sequencing  service 
(Sequencing  Core,  University  of  Florida). 

Approximately  3-5  mL  overnight  cultures  were  grown  from  individual 
colonies  in  Terrific  Broth  (Sigma)  or  LB  Broth  (Difco)  in  15  mL  Falcon  tubes  in  a 
37°C  shaker.  Three  mL  of  culture  were  pelleted  in  microfugc  tubes  by  centrifugation 
at  14000  RPM  for  3 minutes.  The  media  was  then  decanted.  The  bacterial  pellet  was 
resuspended  in  300  pL  glucosc/Tris/EDTA  (GTE)  solution  with  RNasc  A by  gentle 
vonexing.  making  sure  that  the  entire  pellet  was  resuspended.  The  cells  were  lysed 
using  300  pL  SDS/0.2  N NaOH  and  inverted  several  times  until  the  solution  cleared. 
Three  hundred  microliters  of  neutralization  solution  (5  M potassium  acetate)  was 


d,  with  gentle  inversion. 


suiting  precipiti 


centrifuged 


minutes  ai  14,000  RPMs  10  pellet  denatured  proteins  and  chromosomal  DNA.  The 
supernatant  was  then  transferred  to  sterile  microfuge  tubes. 


Chloroform  extraction  was  done  to  remove  any  remaining  proteins.  The 
upper  phase  was  transferred  to  clean  microfuge  tubes  and  precipitated  with  propanol 
and  centrifuging  for  30  minutes.  The  supernatant  was  carefully  decanted  and 
discarded.  The  pellet  was  washed  in  70%  ethanol  and  vortexed.  After  briefly 
centrifuging  to  reposition  the  pellet,  the  supernatant  was  decanted,  and  the  remaining 


liquid  was  carefully  aspirated.  The  pellets  were  air  dried  or  heated  briefly  (5  seconds) 
on  a 75°C  heat  block.  Dried  pellets  were  resuspended  in  50  jiL  ctdH.0.  Fifty 
microliters  of  PEG8000/ 1 .6  M NaCI  was  added  and  allowed  to  incubate  on  ice  for  30- 
60  minutes  to  remove  salts.  After  a 15  minute  centrifugation  at  12,000  RPMs  to 
pellet  the  plasmid  DNA.  the  pellet  was  washed  with  500  pL  70%  ethanol,  vortexed. 
recentrifuged  briefly,  decanted  and  carefully  aspirated.  The  pellets  were  dried  again 
on  a 75°C  heat  block. 


Primer  Extension  Screening 

Both  primer  3.  S'GGTCGTCTAGAGTATGCOGTAG-3'.  and  Template  I 
(6  ACCCTACCGC ATACTCT AG ACG ACC-3  ’ ) were  purchased  from  Integrated 
DNA  Technologies  and  PAGE  purified.  Primer  3 was  redissolvcd  in  double  distilled 
water  to  have  a final  concentration  of  lmicrogram/microliter.  In  a final  reaction 
volume  of  50  pL,  33  (lL  of  primer  was  added  to  5 pL  of  10X  buffer  (70  aiM  Tris- 
HC1,  pH  7.6. 10  mM  MgC!„  5 mM  dithiothreilol),  2 pi.  of  gamma-AT“P  (Pharmacia 
Biotech.),  2 pL  of  T4  polynucleotide  kinase  (New  England  Biolabs),  and  8 pL  of 


1 37°C  for  30  minuics 


buffer  (20OmM  Tris-HQ  pH  8.8. 100  mM  KC1. 100  mM  (NHJ.SO,,  20  raM  MgSO„ 
and  1 1.7  pL  of  ddHr0.  The  reaction  was  incubated  ai  72®C  for  four  minutes.  5.  Pwo; 
JTP.  3 |iL  I0X  buffer  (lOOmM  Tris-HCI  pH  8.8S.  250mM  KC1.  50  mM  (NHJ.SO,, 


primer-icmplaie  solution.  0.7  pL  of  standard  10  mM  dNTPs  or  0.7  pL  10  mM  JTP.  3 
pL  10X  buffer  (20  mM  Tris-HQ.  10  mM  KQ.  10  mM  (NH,),SO„  2 mM  MgSO,. 
0.1%  Triton  X-100  pH  8.8).  I unit  Bst,  and  11.7  pL  ddH,0>.  The  reaction  was 


10  mM  MgClj.  1 mM  DTT  pH  7.5).  1 unit  T7.  and  1 1.7  pL  of  ddH,0.  The  reaction 
was  incubated  at  72°C  for  four  minutes.  3.  Taq;  12  pL  primer-template  solution.  0.7 
pL  of  standard  10  mM  dNTPs  or  0.7  pL  10  mM  JTP.  3 pL  10X  buffer  (50  mM  KQ. 
10  mM  Tris-HQ  pH  9.0.  and  0.1%  Triton  X-100),  1.8  pL  MgClj,  I unit  Taq.  and  9.9 
pL  of  ddH,0.  The  reaction  was  incubated  at  72°C  for  four  minutes.  4.  Tth;  12  pL 

pL  10X  buffer  (50  mM  KQ.  10  mM  Tris-HQ  pH  9.0,  and  0.1%  Triton).  1 unit  Tth.  3 
pL  MgQj,  and  8.7  pL  of  ddH,0.  The  reaction  was  incubated  at  72°C  for  four 


MgCI,.  7,5  mM  DTT  pH  7.5),  1 unit  DNA  pol  I.  and  1 1 .7  |iL  of  ddH,0.  The  reaction 
was  incubated  at  72°C  for  four  minutes. 

formamide/EDTA  loading  buffer  (9  mL  formomide.  0.1  mL  0.5M  EDTA,  7 (tg 
bromophenol  blue  (Fisher  Scientific)).  Seven  microliters  were  loaded  on  to  a 12% 
PAGE  denaturing  gel  (1  X TBE.  2000  V.  3 horns),  dried  for  20  minutes  at  80°C  on  a 
gel  dryer  (BioRad),  and  visualized  on  a phosphorimager  (BioRad). 

Determining  MicroscopjcRrate  Constants  for  Family  B Polymerases 

For  the  running  start  experiments,  the  following  oligonucleotides  were  used: 
Primer  (5’GGTCGTCTAGAGTATGCGGTAG)  and  the  Template 
(5 " ACCCTACCGC ATACTCT A G ACG ACC) . Both  DNA  oligonucleotides  were 
PAGE  purified  and  purchased  from  Integrated  DNA  Technologies.  The  following 
reagents  were  prepared:  Quenching  solution  (0,02  M EDTA  in  95%  Formamide); 
primer/template  stock  solution  (97pL  290  nM  5'radiolabelled  Primer,  133  pL  250  nM 
template.  20  pL  10X  buffer  10  mM  KC1,  10  mM  (NH,)jSO„  20  mM  Tris-HCI  (pH 
S.8),  2 mM  MgSO„  0.1%  Triton  X-100):  trap  stock  solution  (75  pL  100  mM 
unlabellcd  primer,  150  pL  of  100  mM  template  stock.  25  pL  of  10X  buffer); 
Polymerase  stock  solution  (diluted  to  a concentration  0.1  units/pL):  and  dGTP  or 
running  base  stock  solution  (25  pM).  The  primer/templale  and  trap  stock  solutions 


Both  solutions  were  then  added  together. 


Concentrations  of  0 |iM,  0.1  pM,  1 pM.  2 pM,  3 |iM,  7pM,  14pM,  27.5|tM 
target  TTP  and  JTP  solutions  were  made  after  adding  10  pL  of  polymerase  stock 
solution  and  12.5  pL  of  GTP  slock  solution.  The  overall  volume  of  the  target  dNTP 

The  reactions  began  by  placing  5 pL  of  the  primcniemptatc/trap  solution  into 
different  labeled  microfuge  tubes,  according  to  dNTP  concentration.  The  5 |iL  of  the 
appropriate  concentration  of  target  dNTP  was  added.  The  reaction  mixture  was 
placed  on  a heating  block  set  at  72°C  and  incubated  for  5 minutes.  The  reaction  was 
then  quenched  with  10  pL  quenching  solution  and  placed  on  ice.  The  samples  (7pL) 
were  loaded  onto  a 12%  PAGE  gel  and  ran  at  2500V.  The  gels  were  dried  and 
exposed  to  phosphorous  isotope  imaging  plates  and  visualized  and  quantified  on  the 
Molecular  Imager  (Bio  Rad). 

dj-base  Pool  versus  Standard  Base  Pool  Intcrmolecular  Cleavage 

Two  Bio  Rad  chromatography  columns  were  labeled  and  200  pL  of 
streptavidin  (Pluka  Biochemicals)  were  loaded  to  each  column.  Each  column  was 
washed  with  1 mL  of  Tris-EDTA  (lOmM  Tris  HCI,  1 mM  EDTA)  buffer  (pH  7.6) 
and  capped.  The  redissolved  pool  was  added  to  the  column  and  gently  agitated  to 
resuspend  the  streptavidin  with  the  pool  mixture.  The  column  and  pool  were 
incubated  at  23°C  for  20  minutes.  The  column  was  then  washed  10  limes  with  TE 
(pH  7.6).  followed  by  3 volumes  of  0.2  M NaOH.  The  eluted  NaOH  fraction  was 
neutralized  with  3 column  volumes  of  0.2  HCI.  The  single  stranded  DNA  was 
ethanol  precipitated  with  glycogen  as  a carrier  and  incubated  at  -80°C  overnight  The 


ethanol  precipitation  was  centrifuged  nt  4°C  at  14,000  RPMs  for  20  minutes.  The 
supernatant  was  decanted  and  70%  ethanol  was  added  to  each  sample.  The  sample 
was  then  centrifuged  for  2 minutas  at  14,OOORPMs  at  4°C.  The  supernatant  was 
decanted  and  the  samples  were  placed  in  the  Eppendorf  speed  vacuum  for  45  minutes. 

The  pellet  was  rcdissolved  in  100  jiL  neutralization  buffer  <50mM  HEPES, 
1M  N'aCl,  pH  7.6)  and  measured  by  U V absorption  at  260  nm.  Thirty  pmoles  of  pool 
DNA  from  both  the  standard  and  nonfiinciionalizcd  pools  were  taken  out  and  added 
to  two  separate  Eppendorf  tubes  containing  120  pmoles  of  substrate 

(5'CTGCAGAATTCTAATACGACTCACTATrAGGAAGACATGGCGACTCrC- 

3‘)  that  was  rcdissolved  in  99  pL  neutralization  buffer.  The  reaction  mixture  was 
initiated  by  adding  I pL  of  0.1M  MgCl,.  The  reaction  was  allowed  to  proceed  for  2 
hours  at  room  temperature  and  quenched  with  0.02  M EDTA  in  95%  formamide. 
Ten  microliters  were  loaded  onto  12%  PAGE  and  run  at  2500V  for  2.5  hours.  The 
gel  was  dried  and  exposed  overnight  to  “P  exposure  plates  (Bio  Rad).  The  bands 
were  visualized  with  a phosphorimager  (Bio  Rad). 

Kinetic  Analysis  of  Standard  Base  Sequences  and  J-base  Sequences 

Seven  standard  and  seven  J-pool  sequences  were  tested  for  catalytic  activity. 
Sequences  were  chosen  to  represent  the  several  catalytic  fhmilies  that  had  emerged  by 
round  seven  of  selection.  The  standard  sequences  and  complements  of  the  J-pool 
sequences  were  synthesized  (Integrated  DNA  technologies),  biotinylated,  and 
radiolabeled.  The  J-pool  sequences  were  PCR-amplified  using  a 3'-biotinylatcd 
primer,  which  contained  the  single  ribonucleotide  in  a 100-pL  reaction  (5  U pL‘* 


PWO  polymerase,  10X  bufl'er  containing  20  mM  MgS04, 0.2  mM  dNTP,  .007mCi 
ct-dATP).  The  PCR  was  set  to  ran  forlS  cycles  at  60  s for  94T,  120  s for  55°C,  and 
120  s for  72"C. 

Following  ethanol  precipitation,  the  catalytic  activity  of  each  DNA  was 
investigated  under  SELEX  conditions.  Over  a 24-hour  period,  one  column  volume  of 

cleaved  versus  time  was  plotted  from  either  scintillation  count  data  or  quantified  by 
the  Molecular  Imager  (Bio  Rad).  The  catalytic  rates  were  obtained  by  nonlinear 


regression  analysis  (Microsoft  Excel  and  Sigma  Plot)  using  the  function  y=x(l-c*'|. 
where  y represents  fraction,  x represents  the  cleavage  at  time  infinity,  and  k 


represents  the  observed  rate  constant. 


CONCLUSIONS 


As  slated  before,  the  goal  of  this  research  is  to  create  a better  candidate  for  the 
single  biopolymer  life-form  hypothesis.  Not  only  were  catalysts  created  from 
functionalised  DNA.  but  a chemically  endowed  DNA  library  has  been  shown  to  possess  a 
greater  potential  for  producing  catalysis  over  a nonendowed  DNA  library.  These  results 
have  alluded  to  two  novel  concepts  in  the  fields  of  in  vitro  selection  and  the  origins  of  life 

Use  of  a Functionalized  Base  by  a Thermophilic  Polymerase  (First  Concept) 

The  fust  general  concept  can  be  stated:  A protein- like  DNA  triphosphate  hose  can 
be  PCR  amplified  by  a thermophilic  Archaeal  polymerase.  This  amplification  is  efficient 
enough  to  be  used  for  in  vitro  selection  experiments  to  derive  multifunctional 
oligonucleotides  for  catalysis. 

Specifically,  a functionalized  thymidine  with  a positive  charged  amino  group  on 
the  S-position  can  be  incorporated  into  a template  by  Vent  polymerase  with  a decreased 
relative  velocity  of  approximately  30-50%  compared  to  the  standard  TTP  substrate. 
When  the  positive  charge  on  the  J-base  is  protected  with  a phenylacetyl  group,  the 
relative  Vmax  is  restored,  indicating  that  the  positive  charge  is  responsible  for  the 
decrease  in  the  rate  of  polymerization.  However,  the  ability  of  Vent  polymerase  to  bind 
to  this  molecule  is  only  moderately  affected,  and  the  Km  values  between  the  TfP 
substrate  and  the  JTP  molecule  can  nearly  be  corrected  within  one  standard  deviation. 


An  evolutionary  .statement  has  also  been  made.  A screen  between  family  B and 
family  A polymerases  show  that  all  of  family  B polymerases  can  use  the  J-base  as  a 
substrate,  but  most  of  family  A cannot.  A specific  aspartate  residue,  analogous  to  the 


the  J-basc.  The  sequence  alignment  presented  in  this  research  give  proof  that  all  family  B 
polymerases  have  this  aspartate  residue  while  family  A polymerases  do  not.  Futther.  the 
computer  generated  structure  of  the  Vent  polymerase  with  a J-base  substrate  (modeled  by 


methods)  show  that  the  amino  propynyl  arm  of  the  J-basc  is  at  or  in  close  proximity  to 
the  aspartate  residue  543.  This  is  the  first  time  a three-dimensional  structure  is  proposed 
for  the  Vent  polymerase  and  a modified  substrate  has  been  used  to  probe  the  activo  site  to 
determine  an  evolutionary  predilection.  The  following  are  postulates  derived  from  tile 
aforementioned  findings  concerning  the  first  general  concept: 

Postulate  /:  When  using  a family  B polymerase,  if  the  functionalised  nucleotide 
substrate  has  a Km  correctable  within  one  standard  deviation  compared  to  its 
standard  nucleotide  counterpart,  and  its  Vmax  is  only  moderately  affected;  then 
all  other  Family  B polymerases  are  capable  of  catalyzing  a phosphodiaster  bond 
between  the  functionalized  base  and  its  primer. 

Postulate  2:  Although  the  positive  charge  from  the  J-basc  decreases  tile  Vmax. 
it's  presence  also  allows  the  family  B polymerases  to  incorporate  the  J-base 
triphosphate  by  drawing  the  positive  charged  functional  group  away  from  the  two 
other  aspartate  residues  in  Region  I and  Region  II  (in  Vent  these  residues  are 
D407  and  D545),  that  arc  more  important  for  catalysis.  The  absence  of  this 
analogous  residue  in  family  A polymerases  puts  these  family  of  polymerases  at  a 
disadvantage  in  incorporating  the  dJ  triphosphate. 


From  the  J-base  sequence  data,  it  is  clear  that  the  J-base  is  not  selected  against 
and  is  found  in  the  conserved  region.  The  original  pool  of  DNA  was  randomized  such 
that  25%  of  each  base  should  be  represented  in  most  of  the  sequences.  In  both  the  J-base 


and  siandaid  selection,  the  J or  T content  is  at  25%  in  the  random  regions.  This  lends  us 
to  the  third  postulate: 

Postulate  5:  if  the  Km  of  the  functionalized  nucleotide  is  correctable  within  one 
standard  deviation  to  its  standard  nucleotide  counterpart,  and  its  Vmax  is  only 
moderately  affected:  then  the  functionalized  nucleotide  will  not  (necessarily)  be 
selected  against  during  polymerase  chain  reaction  amplification. 

To  recapitulate,  the  first  general  concept  has  generated  three  postulates  that 
validate  the  methodology  for  using  functionalized  bases  for  the  purposes  of  PCR 
amplification  and  in  vitro  selection.  Specifically,  we  have  (a)  addressed  the  relationship 
between  functionalized  nucleotides  and  different  families  or  polymerases,  (b)  addressed 
how  and  whnt  mechanisms  these  relationships  arc  based,  and  (c)  addressed  whether  or 
not  these  relationships  makes  it  a viable  tool  for  in  vitro  selection.  By  validating  the 
methodology  used  to  pursue  the  hypothesis,  we  can  now  move  on  to  the  hypothesis  itself 
and  proceed  to  the  second  general  concept. 

Functionality  Improves  Catalysis  (Second  Concept) 

The  second  general  concept  focuses  on  the  in  vitro  selection  and  evolution 
experiments  done  to  create  the  catalytieally  active  functionalized  DNA  molecules.  The 
second  general  concept  can  be  stated:  The  addition  of  functionality  lor  added  chemical 
groups)  to  DNA  improves  the  probability  of  finding  better  catalysts. 

Using  the  Wcibul!  distribution,  we  can  state  with  statistical  confidence  that  the 
functionalized  pool  is  1.36  times  more  likely  to  find  catalysts  with  a rate  of  0.0047/min  or 
better  than  the  nonfunctionalizcd  pool.  This  is  achieved  by  the  mere  addition  of  a 
positively  charged  lysine-like  functional  group  to  DNA. 


Qualitatively,  both  a theoretically  derived  probability  function  (aexp{-bk)-  acxp(~ 
bk)(exp-kt)  and  the  Weibull  distribution,  are  able  to  derive  comparable  profiles  of 
distributions  of  catalysts  between  the  endowed  DNA  libraries  and  nonendowed  DNA 
libraries.  These  two  distributions  clearly  show  that  the  initial  distribution  for  the 
functionalized  pool  has  a greater  number  of  faster  catalysts  than  the  nonfunetionalizcd 

A rigorous  statistical  analysis  showed  that  the  average  rate  between  the 
functionalized  pool  and  the  nonfunetionalizcd  pool  possessed  confidence  intervals  that 
did  not  overlap.  This  implied  that  the  rates  between  the  functionalized  base  and 
nonfunetionalizcd  pools  were  statistically  significant  and  were  unlikely  due  to  stochastic 
uncertainty.  Thus  any  subsequent  transformation  of  the  data,  including  the 


statistically  significant. 

The  trends  and  behavior  of  the  selection  experiments  have  been  predicted  using 
probability  theory  and  subsequently  measured  with  some  degree  of  statistical 
significance.  We  can  now  predict  and  measure  the  rate  of  optimization  by  the  selection 
pressure:  the  consequence  of  tightening  the  selection  pressure  on  the  population  of 
catalysts;  and  most  importantly,  quantify  the  improvement  of  finding  a selectable 
behavior  or  property  in  one  type  of  biopolymer  over  another. 


those  used  by  Breaker  and  Joyce  and  the  molecules  yielded  catalytic  efficiencies 
comparable  to  those  obtained  by  their  experiments,  the  sequences  themselves  selected 


fiom  the  natural  library  were  not  similar  to  the  sequences  obtained  by  Breaker  and  Joyce. 
This  suggests  that  the  landscape  relating  catalytic  power  to  sequence  space  is  sufficiently 
nigged  (Benner.  1988).  the  distribution  of  catalytic  power  across  this  landscape  is 
sufficiently  sparsely,  and/or  the  selection  pressure  is  not  sufficiently  strong,  to  enforce 
convergence  to  the  same  sequences.  This  is  different  from  the  results  obtained  by  parallel 
experiments  with  DNA  molecules  that  act  as  receptors  for  ATP  (Battersby.  1999; 
Huizenga,  1 995),  and  suggests  that  catalysts  are  more  difficult  to  find  than  receptors. 

Also,  the  sequences  of  the  catalysts  containing  the  J-base  were  not  analogous  to 
those  extracted  from  the  natural  DNA  libraiy.  This  suggested  that  the  J-base  contributed 
uniquely  to  the  ability  of  these  molecules  to  catalyze  the  cleavage  of  an  RNA-DNA  linkage. 
To  confirm  this  suggestion,  the  catalyst  sequences  were  prepared  with  T replacing  J.  No 
catalytic  activity  was  detectable  in  these  molecules,  establishing  the  essential  nature  of  the 
J-base  for  catalysis. 

Finally  it  is  worth  noting  that  throughout  the  selection,  including  in  the  modest 
sample  of  sequences  examined  individually,  catalysts  derived  from  a library  containing  the 
J-base  outperformed  catalysts  derived  from  the  natural  DNA  library.  In  one  individual 
instance  (J729,  Table  3),  the  J sequence  performed  25  times  better  than  the  best  natural 
deoxyribozyme.  This  suggests  that  the  J-base  increases  the  intrinsic  ability  of  a library  to 
deliver  catalytic  power. 

Simply  put,  both  the  quantitative  data  and  qualitative  data  show  that  functionality 
improves  DNA  catalysis.  This  is  the  first  time  that  data  of  this  type  have  been  obtained  for 
any  combinatorial  experiment.  The  results  arc  consequential  for  any  effort  to  generate  a 


combinatorial  solution  to  a technological  problem,  and  for  any  discussion  that  suggests  a 
combinatorial  origin  for  life  on  Earth, 

In  closing,  this  project  has 


catalytically  active  functionalized  DNA  molecule  have  proven  to  be  equally  important.  Both 
concepts  give  us  an  idea  of  how  feasible  it  is  to  do  in  vitro  selection  with  functionalized 
bases  and  at  the  same  time,  gain  some  new  insights  on  the  molecular  evolutionary  aspects 
on  the  origins  of  life.  Functionalized  DNA  moieties  may  have  existed  at  one  time  as 
substrates  for  Archacal  thermophiles,  or  they  could  themselves  have  been  self-replicating 
molecules.  The  closer  we  are  at  obtaining  a more  complex  molecule,  the  closer  we  become 
to  a self-sustaining  one.  The  first  step  taken  here  makes  it  worth  articulating  the  project 
statement;  A random  pool  of  amino  acid-like  DNA  molecules  has  been  evolved  under 
Darwinian  selection  pressures  to  perform,  not  only  catalysis,  but  a quantifiable  improvement 
in  the  rate  of  catalysis  over  nonmodified  catalytic  DNA.  As  a result  a bridge  between  the 


f of  life.  The  methodology  and  experimental  data  for  the 


chemical  system  and  life  has  been  formed. 


REFERENCES 


Golden,  B.  L.,  Gooding,  A.  R„  Podell,  E.  R..  and  Cecil,  T.  R.  (1998).  A Preorganized 
Active  Site  in  the  Crystal  Structure  of  the  Tetrahymena  Ribozyme.  Science  282. 259-264. 

Green.  R„  Switzer.^ani^Nolier.  H.  F.  (1998).  Ribosome-Catalyzed  Peptide-Bond 


Kim,  Y.  .Eom  S.H..  Wang  J.M.,  Lee  D.  S„  Suh  S.  W„  Steilz  T.  A.  (1995).  Crystal- 
Structure  of  Thermus-Aquaticus  DNA-Poiymerasc.  Nature  376, 612-616. 


Johnson,  N.  L..  Kotz.  S..  and  Balakrishnan,  N.  (1994).  COntinous  Univariate 


Lazeano.  A.  and  Miller,  S.  L.  (1996).  The  Origin  and  Early  Evolution  of  Life:  Prcbiotic 
Chemistry,  the  Pre-RNA  World,  and  Time.  Cell  85, 793-798. 

Lee,  D.  H„  Scverin,  K. , Yokobayashi.  Y. , and  Ghadiri.  M.  R.  (1997).  Emergence  of 
Symbiosis  in  Peptide  Self-Replication  through  a Hypercyclic  Network.  Nature  390, 591- 


Lee,  D.  H.,  Granja,  J.  R„  Martinez.  J.  A„  Sevcrin,  K.,  and  Ghadir 
replicating  peptide.  Nature  382, 525. 


in.  R,  (1996).. 


121 


122 


Scotl.  W.  G..  Finch,  J.  T.,  and  King,  A.  (1995).  The  Crystal  Siniclure  of  an  All-RNA 
Hammerhead  Ribozyme:  A Proposed  Mechanism  for  RNA  Caudytic  Cleavage.  Cell  81. 


Thompson,  J.  D.,  Higgins,  D.  G. , and  Gibson,  T.  J.  ( 1994).  CLUSTAL  W:  Improving  rhe 
Scnsilivily  of  Progressive  Mulliplc  Sequence  Alignment  through  Sequence  Weighting 
Position-Specific  Gap  Penalties  and  Weight  Matrix  Choice.  Nucleic  Acids  Research  22. 


BIOGRAPHICAL  SKETCH 


Darwin  Noel  Ang  was  born  in  Slillwaicr.  Oklahoma  on  December  9. 1971.  He  is 

of  1990,  he  attended  the  Florida  State  University  where  he  received  his  B.S.  in  chemistry 
and  anthropology.  Originally  he  aspired  to  be  an  archaeologist  or  writer,  but  found  his 
passion  in  helping  others  especially  through  research  and  medicine.  In  1993  he  received 
FSU's  President's  Humanitarian  award. 

In  1994,  Darwin  attended  the  University  of  Florida  where  he  was  accepted  into 
the  M.D7Pb.D.  Medical  Scientist  Training  Program.  After  completing  two  ycais  of 
medical  school,  he  began  his  graduate  training  under  the  guidance  of  Professor  Steven  A. 
Benner.  During  his  graduate  studies,  he  was  partially  funded  by  the  American  Heart 
Association  Prcdoctorai  Fellowship. 

After  receiving  his  Ph.D„  Darwin  plans  to  return  to  medical  school  to  complete 
his  clinical  training  and  graduate  with  the  class  of  2001.  In  the  meantime,  he  enjoys 
playing  on  his  piano  and  playing  men's  club  soccer  with  his  friends  where  he  serves  as 


tceptable  standards  of  scholarly  presentation 
i a dissertation  for  the  degree  of  Doctor  of  PI 


cceptable  standards  of  scholarly  pre 


>hy.  a 


